FiveTech Software tech support forums

by **Antonio Linares** » Mon Dec 25, 2023 12:52 pm

This code seems to work, though we still don't know how long it will take the training:

I appreciate, if you have a pc with nvidia GPUs, to test it and report how long it takes.

Just replace "fivetech_forums_20231222.sql" with a large text file you may have, thanks

train.py

Code: Select all Expand view: import torch from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config from transformers import TextDataset, DataCollatorForLanguageModeling from transformers import Trainer, TrainingArguments # Load pre-trained GPT-2 model and tokenizer model_name = "gpt2" model = GPT2LMHeadModel.from_pretrained(model_name) tokenizer = GPT2Tokenizer.from_pretrained(model_name) tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Load your additional training data from a file file_path = "fivetech_forums_20231222.sql" with open(file_path, "r", encoding="utf-8") as file: train_text = file.read() # Tokenize the training data train_tokens = tokenizer(train_text, return_tensors="pt", truncation=True, padding=True) # Create a PyTorch Dataset train_dataset = TextDataset( tokenizer=tokenizer, file_path=file_path, block_size=128 # Adjust the block size based on your dataset ) # Create a data collator data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=False # Set to True if your training data includes masked language modeling objective ) # Configure training arguments training_args = TrainingArguments( output_dir="./fine-tuned-model", overwrite_output_dir=True, num_train_epochs=3, # Adjust the number of epochs based on your dataset per_device_train_batch_size=4, save_steps=10_000, save_total_limit=2, logging_dir="./logs", ) # Initialize Trainer trainer = Trainer( model=model, args=training_args, data_collator=data_collator, train_dataset=train_dataset, ) # Fine-tune the model trainer.train() # Save the fine-tuned model model.save_pretrained("./fine-tuned-model") tokenizer.save_pretrained("./fine-tuned-model")

by **Antonio Linares** » Mon Dec 25, 2023 1:51 pm

Do you have a pc with a powerfull nvidia card ?

If so, are you willing to help doing some tests training AI ? Please let me know it

many thanks

by **alerchster** » Mon Dec 25, 2023 2:54 pm

If an NVIDIA RTX 2080 TI is enough, then I want to support the testing?

by **Antonio Linares** » Mon Dec 25, 2023 5:22 pm

Dear Anton,

That one would be great!

Could you please try to run the previous python code and see how long time does it report ?

python train.py

I am emailing you the file

many thanks!

by **alerchster** » Mon Dec 25, 2023 6:04 pm

========================== RESTART: C:\fwh\AI\train.py =========================
Traceback (most recent call last):
File "C:\fwh\AI\train.py", line 1, in <module>
import torch
ModuleNotFoundError: No module named 'torch'

by **Antonio Linares** » Mon Dec 25, 2023 6:06 pm

run this:

pip install torch

and then please try it again

by **alerchster** » Mon Dec 25, 2023 6:13 pm

C:\fwh\AI>pip install torch
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

C:\fwh\AI>

by **Antonio Linares** » Mon Dec 25, 2023 6:21 pm

If you're encountering issues with installing PyTorch using pip install torch, it could be due to several reasons. Here are a few steps you can take to resolve the issue:

Check Python Version:
Ensure that you are using a compatible version of Python. PyTorch may have specific requirements for the Python version. As of my last knowledge update in January 2022, PyTorch 1.10.0 supports Python 3.6, 3.7, 3.8, and 3.9. You can check the PyTorch website for the latest compatibility information.

Use the Correct Pip Command:
Ensure you are using the correct pip command based on your system and whether you want to install the CPU or GPU version of PyTorch. Use one of the commands mentioned in the previous response that corresponds to your system and requirements.

Upgrade Pip:
Ensure that your pip version is up-to-date. You can upgrade pip using the following command:

bash
Copy code
pip install --upgrade pip
Check Internet Connection:
Ensure that your internet connection is stable and not blocking the connection to the PyTorch servers. If you are behind a proxy, you may need to configure your proxy settings.

Firewall/Antivirus:
Check if your firewall or antivirus software is blocking the connection. Temporarily disabling them for the installation process might help.

Conda (Optional):
If you're still having trouble, consider using conda, which is another package manager. You can create a new conda environment and install PyTorch with the following commands:

bash
Copy code
conda create -n myenv python=3.8
conda activate myenv
conda install pytorch==1.10.0 torchvision==0.11.2 torchaudio==0.10.0 -c pytorch
Remember to replace myenv with your desired environment name.

If you are still facing issues after trying these steps, please provide more details about your operating system, Python version, and any error messages you receive so that I can offer more specific assistance.

by **alerchster** » Mon Dec 25, 2023 6:42 pm

python is 3.12
pip is uptodate
antivirus is stopped
?

C:\fwh\AI>pip install pytorch
Collecting pytorch
Using cached pytorch-1.0.2.tar.gz (689 bytes)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: pytorch
Building wheel for pytorch (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for pytorch (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
Traceback (most recent call last):
File "C:\Users\alerc\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
main()
File "C:\Users\alerc\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alerc\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alerc\AppData\Local\Temp\pip-build-env-k67z_tif\overlay\Lib\site-packages\setuptools\build_meta.py", line 404, in build_wheel
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alerc\AppData\Local\Temp\pip-build-env-k67z_tif\overlay\Lib\site-packages\setuptools\build_meta.py", line 389, in _build_with_temp_dir
self.run_setup()
File "C:\Users\alerc\AppData\Local\Temp\pip-build-env-k67z_tif\overlay\Lib\site-packages\setuptools\build_meta.py", line 480, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "C:\Users\alerc\AppData\Local\Temp\pip-build-env-k67z_tif\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup
exec(code, locals())
File "<string>", line 15, in <module>
Exception: You tried to install "pytorch". The package named for PyTorch is "torch"
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pytorch
Failed to build pytorch
ERROR: Could not build wheels for pytorch, which is required to install pyproject.toml-based projects

C:\fwh\AI>

by **alerchster** » Mon Dec 25, 2023 7:11 pm

I think i should downgrade python to 3.11

https://stackoverflow.com/questions/77225812/is-there-a-way-to-install-pytorch-on-python-3-12-0

here is the output with conda ...
C:\fwh\AI>conda install pytorch==1.10.0 torchvision==0.11.2 torchaudio==0.10.0 -c pytorch
Channels:
- pytorch
- defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: | warning libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
failed

LibMambaUnsatisfiableError: Encountered problems while solving:
- package torchvision-0.11.2-py36_cpu requires python >=3.6,<3.7.0a0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ __cuda is requested and can be installed;
├─ pin-1 is installable and it requires
│ └─ python 3.12.* , which can be installed;
└─ torchvision 0.11.2 is not installable because there are no viable options
├─ torchvision 0.11.2 would require
│ └─ python >=3.6,<3.7.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.11.2 would require
│ └─ cudatoolkit >=11.3,<11.4 , which requires
│ └─ __cuda >=11.3 , which conflicts with any installable versions previously reported;
├─ torchvision 0.11.2 would require
│ └─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.11.2 would require
│ └─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
└─ torchvision 0.11.2 would require
└─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported.

Pins seem to be involved in the conflict. Currently pinned specs:
- python 3.12.* (labeled as 'pin-1')

by **alerchster** » Mon Dec 25, 2023 8:04 pm

installed python 3.11.7
pip install torch
pip install transformers

python train.py

result: ready in 16min maybe with some errors - error messages can't be saved there!

if i run train.py again then comes terrible | 213/1628301 [05:57<1293:44:33, 2.86s/it]

can't wait so long

by **Antonio Linares** » Tue Dec 26, 2023 12:09 am

Dear Anton,

> result: ready in 16min maybe with some errors - error messages can't be saved there!

Can you please send me te generated "fine-tuned-model" files using wormhole app ?

https://wormhole.app/

many thanks!

by **alerchster** » Tue Dec 26, 2023 6:32 am

Hello Antonio,

From the looks of it, the training doesn't just happen in one round:
First round generated a 715MB file "cached_lm_GPT2Tokenizer_128_fivetech_forums_20231222.sql" in 16min.
The second round is actually the training and I'm now at 0% after 47 minutes and the open duration is given as 3900 hours - the value is constantly growing.

The fine-tuned-model directory is still empty.

by **Antonio Linares** » Tue Dec 26, 2023 7:16 am

Dear Anton,

many thanks for your so valuable feedback!

by **alerchster** » Wed Dec 27, 2023 8:33 am

It's been two days since I've had anything to do with Python!

DESKTOP PC with Windows 11 pro:
Processor: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz, 3504 MHz, 12 core(s), 24 logical processor(s)
Data carrier: Samsung SSD 970 EVO Plus 1TB
64GB RAM
NVIDIA RTX 2080TI

First attempt, without GPU support (because I didn't know any better):
Python installed, first 3.12 (does not support torch, then downgraded to 3.11.7)
Torch installed
Transformers installed

train.py executed
CPU utilization > 10% and 14.4GB RAM
After 16min cached_lm_GPT2Tokenizer_128_fivetech_forums_20231222.sql prepared - only then does the training begin - we always have this effort and it currently only runs on the CPU.
Training after 8 hours and 3,400 runs, no output in fine-tuned model and remaining term > 1 year
Process canceled manually.

Second attempt, with GPU support:
CUDA Toolkit 12.3 installed
Torch uninstalled
pip uninstall torch
pip cache purge
pip install torch -f https://download.pytorch.org/whl/torch_stable.html

torch.cuda.is_available()
True

train.py executed
CPU utilization 5% and 12.3GB RAM
Training after 7.5 hours and 3,860 runs, no output in fine-tuned model and remaining term < 1 year
Process canceled manually.

Third attempt, with GPU support:
Transformers uninstalled
pip uninstall transformers
pip cache purge
pip install transformers
None of this should make any difference or bring about any improvement.

in train.py save_steps reduced to 500 so that there should be an output in fine-tuned model.

train.py executed
It is striking that after around 300 runs the performance drops extremely, with < 200 runs < 2.4 seconds and with the 300th run it is already 5 seconds.
CPU utilization 10% and 12.3GB RAM
Training after 40 minutes with 500 runs - first output.
Process canceled manually.
Checkpoint 500 is transferred to wormhole.app.

Fourth attempt to further reduce save_steps to 200
Training after 1.05 minutes, checkpoint 200 generated
Training after 12.05 minutes, checkpoint 1200 generated
Training after 18.21 minutes, Checkpoint 2000 generated
Checkpoint 1600 and 1800 will be transferred to wormhole.app.

But we need 1.6 million runs.

So the save_steps need to be optimized and perhaps more.

FiveTech Software tech support forums

phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Re: phpBB to LLM

Who is online