Dear Anton,
many thanks for your help!
I am reviewing the results
phpBB to LLM
- Antonio Linares
- Site Admin
- Posts: 42393
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 9 times
- Been thanked: 41 times
- Contact:
- Antonio Linares
- Site Admin
- Posts: 42393
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 9 times
- Been thanked: 41 times
- Contact:
Re: phpBB to LLM
Here you have run.py to test the model:
run.py
run.py
Code: Select all | Expand
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
# Load the fine-tuned GPT-2 model and tokenizer
fine_tuned_model_path = "./fine-tuned-model"
model = GPT2LMHeadModel.from_pretrained(fine_tuned_model_path)
tokenizer = GPT2Tokenizer.from_pretrained(fine_tuned_model_path)
# Input prompt for text generation
prompt = "what is a star ?"
# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones_like(input_ids)
pad_token_id = tokenizer.eos_token_id
max_new_tokens = 50
# Generate text using the fine-tuned model
output = model.generate(input_ids, attention_mask=attention_mask, pad_token_id=pad_token_id, max_length=len(input_ids[0]) + max_new_tokens, num_beams=5, no_repeat_ngram_size=2)
# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Print the generated text
print("Generated Text:", generated_text)
- Antonio Linares
- Site Admin
- Posts: 42393
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 9 times
- Been thanked: 41 times
- Contact:
Re: phpBB to LLM
From posts.dbf and posts.fpt now we generate the dataset.json to be trained. We are using just 20 different topics, so the dataset is not too large and we can do quicker tests with it:
dataset.prg
The structure of the generated json file is as follows:
so basically it is a list of the topics, with the name of the topic and the list of messages for such topic.
dataset.prg
Code: Select all | Expand
#include "FiveWin.ch"
request dbfcdx
function Main()
local aPosts := {}, n
USE posts VIA "dbfcdx"
INDEX ON posts->topic + posts->date + posts->time + posts->forum TO subject
GO TOP
for n = 1 to 20
AAdd( aPosts, GetTopic() )
next
hb_memoWrit( "dataset.json", hb_jsonEncode( aPosts ) )
XBrowser( aPosts )
return nil
function GetTopic()
local hTopic := {=>}, cTopic := RTrim( posts->topic )
hTopic[ "topic" ] = RTrim( posts->topic )
hTopic[ "messages" ] = {}
AAdd( hTopic[ "messages" ], GetPost() )
SKIP
while posts->topic == cTopic
AAdd( hTopic[ "messages" ], GetPost() )
SKIP
end
return hTopic
function GetPost()
local hPost := {=>}
hPost[ "topic" ] = RTrim( posts->topic )
hPost[ "forum" ] = RTrim( posts->forum )
hPost[ "username" ] = RTrim( posts->username )
hPost[ "date" ] = posts->date
hPost[ "time" ] = posts->time
hPost[ "text" ] = posts->text
return hPost
The structure of the generated json file is as follows:
Code: Select all | Expand
[
{ "topic": the title of the topic,
"messages":
[
{
"topic": the title of the topic,
"forum": the forum name,
"username": name of the author,
"date": date of the post,
"time": time of the post,
"text": text of the post
},
next posts for the same topic
]
},
next topic,
...
]
- Antonio Linares
- Site Admin
- Posts: 42393
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 9 times
- Been thanked: 41 times
- Contact:
Re: phpBB to LLM
Edited the first post of this topic with the right instructions:
https://fivetechsupport.com/forums/view ... 4b#p266364
https://fivetechsupport.com/forums/view ... 4b#p266364