phpBB to LLM

Re: phpBB to LLM

Postby Antonio Linares » Wed Dec 27, 2023 9:52 am

Dear Anton,

many thanks for your help!

I am reviewing the results :-)
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
 
Posts: 41314
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain

Re: phpBB to LLM

Postby Antonio Linares » Wed Dec 27, 2023 10:10 am

Here you have run.py to test the model:

run.py
Code: Select all  Expand view
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load the fine-tuned GPT-2 model and tokenizer
fine_tuned_model_path = "./fine-tuned-model"
model = GPT2LMHeadModel.from_pretrained(fine_tuned_model_path)
tokenizer = GPT2Tokenizer.from_pretrained(fine_tuned_model_path)

# Input prompt for text generation
prompt = "what is a star ?"

# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones_like(input_ids)
pad_token_id = tokenizer.eos_token_id
max_new_tokens = 50

# Generate text using the fine-tuned model
output = model.generate(input_ids, attention_mask=attention_mask, pad_token_id=pad_token_id, max_length=len(input_ids[0]) + max_new_tokens, num_beams=5, no_repeat_ngram_size=2)

# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:", generated_text)
 
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
 
Posts: 41314
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain

Re: phpBB to LLM

Postby Antonio Linares » Tue Jan 02, 2024 5:40 am

From posts.dbf and posts.fpt now we generate the dataset.json to be trained. We are using just 20 different topics, so the dataset is not too large and we can do quicker tests with it:

dataset.prg
Code: Select all  Expand view
#include "FiveWin.ch"

request dbfcdx

function Main()

    local aPosts := {}, n

    USE posts VIA "dbfcdx"

    INDEX ON posts->topic + posts->date + posts->time + posts->forum TO subject
    GO TOP

    for n = 1 to 20
       AAdd( aPosts, GetTopic() )
    next
    hb_memoWrit( "dataset.json", hb_jsonEncode( aPosts ) )
    XBrowser( aPosts )

return nil

function GetTopic()

    local hTopic := {=>}, cTopic := RTrim( posts->topic )

    hTopic[ "topic" ]    = RTrim( posts->topic )
    hTopic[ "messages" ] = {}

    AAdd( hTopic[ "messages" ], GetPost() )
    SKIP
    while posts->topic == cTopic
       AAdd( hTopic[ "messages" ], GetPost() )
       SKIP
    end

return hTopic    

function GetPost()

    local hPost := {=>}

    hPost[ "topic" ]    = RTrim( posts->topic )
    hPost[ "forum" ]    = RTrim( posts->forum )
    hPost[ "username" ] = RTrim( posts->username )
    hPost[ "date" ]     = posts->date
    hPost[ "time" ]     = posts->time
    hPost[ "text" ]     = posts->text

return hPost    

The structure of the generated json file is as follows:
Code: Select all  Expand view
[
   {  "topic": the title of the topic,
      "messages":
      [
         {
            "topic": the title of the topic,
            "forum": the forum name,
            "username": name of the author,
            "date": date of the post,
            "time": time of the post,
            "text": text of the post
         },
        next posts for the same topic
      ]
   },
   next topic,
   ...
]

so basically it is a list of the topics, with the name of the topic and the list of messages for such topic.
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
 
Posts: 41314
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain

Re: phpBB to LLM

Postby Antonio Linares » Sun Jan 07, 2024 9:28 am

Edited the first post of this topic with the right instructions:

viewtopic.php?p=266364&sid=34d610603696853e410ee75921e1424b#p266364
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
 
Posts: 41314
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain

Previous

Return to FiveWin for Harbour/xHarbour

Who is online

Users browsing this forum: Google [Bot] and 59 guests