Page 3 of 3

Re: phpBB to LLM

PostPosted: Wed Dec 27, 2023 9:52 am
by Antonio Linares
Dear Anton,

many thanks for your help!

I am reviewing the results :-)

Re: phpBB to LLM

PostPosted: Wed Dec 27, 2023 10:10 am
by Antonio Linares
Here you have run.py to test the model:

run.py
Code: Select all  Expand view
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load the fine-tuned GPT-2 model and tokenizer
fine_tuned_model_path = "./fine-tuned-model"
model = GPT2LMHeadModel.from_pretrained(fine_tuned_model_path)
tokenizer = GPT2Tokenizer.from_pretrained(fine_tuned_model_path)

# Input prompt for text generation
prompt = "what is a star ?"

# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones_like(input_ids)
pad_token_id = tokenizer.eos_token_id
max_new_tokens = 50

# Generate text using the fine-tuned model
output = model.generate(input_ids, attention_mask=attention_mask, pad_token_id=pad_token_id, max_length=len(input_ids[0]) + max_new_tokens, num_beams=5, no_repeat_ngram_size=2)

# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:", generated_text)
 

Re: phpBB to LLM

PostPosted: Tue Jan 02, 2024 5:40 am
by Antonio Linares
From posts.dbf and posts.fpt now we generate the dataset.json to be trained. We are using just 20 different topics, so the dataset is not too large and we can do quicker tests with it:

dataset.prg
Code: Select all  Expand view
#include "FiveWin.ch"

request dbfcdx

function Main()

    local aPosts := {}, n

    USE posts VIA "dbfcdx"

    INDEX ON posts->topic + posts->date + posts->time + posts->forum TO subject
    GO TOP

    for n = 1 to 20
       AAdd( aPosts, GetTopic() )
    next
    hb_memoWrit( "dataset.json", hb_jsonEncode( aPosts ) )
    XBrowser( aPosts )

return nil

function GetTopic()

    local hTopic := {=>}, cTopic := RTrim( posts->topic )

    hTopic[ "topic" ]    = RTrim( posts->topic )
    hTopic[ "messages" ] = {}

    AAdd( hTopic[ "messages" ], GetPost() )
    SKIP
    while posts->topic == cTopic
       AAdd( hTopic[ "messages" ], GetPost() )
       SKIP
    end

return hTopic    

function GetPost()

    local hPost := {=>}

    hPost[ "topic" ]    = RTrim( posts->topic )
    hPost[ "forum" ]    = RTrim( posts->forum )
    hPost[ "username" ] = RTrim( posts->username )
    hPost[ "date" ]     = posts->date
    hPost[ "time" ]     = posts->time
    hPost[ "text" ]     = posts->text

return hPost    

The structure of the generated json file is as follows:
Code: Select all  Expand view
[
   {  "topic": the title of the topic,
      "messages":
      [
         {
            "topic": the title of the topic,
            "forum": the forum name,
            "username": name of the author,
            "date": date of the post,
            "time": time of the post,
            "text": text of the post
         },
        next posts for the same topic
      ]
   },
   next topic,
   ...
]

so basically it is a list of the topics, with the name of the topic and the list of messages for such topic.

Re: phpBB to LLM

PostPosted: Sun Jan 07, 2024 9:28 am
by Antonio Linares
Edited the first post of this topic with the right instructions:

viewtopic.php?p=266364&sid=34d610603696853e410ee75921e1424b#p266364