phpBB to LLM

User avatar
Antonio Linares
Site Admin
Posts: 42270
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain
Contact:

Re: phpBB to LLM

Post by Antonio Linares »

Dear Anton,

many thanks for your help!

I am reviewing the results :-)
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
Posts: 42270
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain
Contact:

Re: phpBB to LLM

Post by Antonio Linares »

Here you have run.py to test the model:

run.py

Code: Select all | Expand

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load the fine-tuned GPT-2 model and tokenizer
fine_tuned_model_path = "./fine-tuned-model"
model = GPT2LMHeadModel.from_pretrained(fine_tuned_model_path)
tokenizer = GPT2Tokenizer.from_pretrained(fine_tuned_model_path)

# Input prompt for text generation
prompt = "what is a star ?"

# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones_like(input_ids)
pad_token_id = tokenizer.eos_token_id
max_new_tokens = 50

# Generate text using the fine-tuned model
output = model.generate(input_ids, attention_mask=attention_mask, pad_token_id=pad_token_id, max_length=len(input_ids[0]) + max_new_tokens, num_beams=5, no_repeat_ngram_size=2)

# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:", generated_text)
 
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
Posts: 42270
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain
Contact:

Re: phpBB to LLM

Post by Antonio Linares »

From posts.dbf and posts.fpt now we generate the dataset.json to be trained. We are using just 20 different topics, so the dataset is not too large and we can do quicker tests with it:

dataset.prg

Code: Select all | Expand

#include "FiveWin.ch"

request dbfcdx

function Main()

    local aPosts := {}, n

    USE posts VIA "dbfcdx"

    INDEX ON posts->topic + posts->date + posts->time + posts->forum TO subject
    GO TOP

    for n = 1 to 20
       AAdd( aPosts, GetTopic() )
    next
    hb_memoWrit( "dataset.json", hb_jsonEncode( aPosts ) )
    XBrowser( aPosts )

return nil

function GetTopic()

    local hTopic := {=>}, cTopic := RTrim( posts->topic )

    hTopic[ "topic" ]    = RTrim( posts->topic ) 
    hTopic[ "messages" ] = {}

    AAdd( hTopic[ "messages" ], GetPost() )
    SKIP 
    while posts->topic == cTopic
       AAdd( hTopic[ "messages" ], GetPost() ) 
       SKIP 
    end

return hTopic    

function GetPost() 

    local hPost := {=>}

    hPost[ "topic" ]    = RTrim( posts->topic )
    hPost[ "forum" ]    = RTrim( posts->forum )
    hPost[ "username" ] = RTrim( posts->username )
    hPost[ "date" ]     = posts->date 
    hPost[ "time" ]     = posts->time
    hPost[ "text" ]     = posts->text

return hPost    

The structure of the generated json file is as follows:

Code: Select all | Expand

[
   {  "topic": the title of the topic,
      "messages":
      [ 
         {
            "topic": the title of the topic,
            "forum": the forum name,
            "username": name of the author,
            "date": date of the post,
            "time": time of the post,
            "text": text of the post
         },
        next posts for the same topic
      ]
   },
   next topic,
   ...
]
so basically it is a list of the topics, with the name of the topic and the list of messages for such topic.
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
Posts: 42270
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain
Contact:

Re: phpBB to LLM

Post by Antonio Linares »

Edited the first post of this topic with the right instructions:

https://fivetechsupport.com/forums/view ... 4b#p266364
regards, saludos

Antonio Linares
www.fivetechsoft.com
Post Reply