KERMIT: Knowledge Extractive and Reasoning Model usIng Transformers
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Sustainable development
The essay/thesis is partially on sustainable development according to the University's criteria
Abstract [en]
In the rapidly advancing field of artificial intelligence, Large Language Models (LLMs) like GPT-3, GPT-4, and Gemini have revolutionized sectors by automating complex tasks. Despite their advancements, LLMs and more noticeably smaller language models (SLMs) still face challenges, such as generating unfounded content "hallucinations." This project aims to enhance SLMs for broader accessibility without extensive computational infrastructure. By supervised fine-tuning of smaller models with new datasets, SQUAD-ei and SQUAD-GPT, the resulting model, KERMIT-7B, achieved superior performance in TYDIQA-GoldP, demonstrating improved information extraction while retaining generative quality.
Abstract [sv]
Inom det snabbt växande området artificiell intelligens har stora språkmodeller (LLM) som GPT-3, GPT-4 och Gemini revolutionerat sektorer genom att automatisera komplexa uppgifter. Trots sina framsteg stårdessa modeller, framför allt mindre språkmodeller (SLMs) fortfarande inför utmaningar, till exempel attgenerera ogrundat innehåll "hallucinationer". Denna studie syftar till att förbättra SLMs för bredare till-gänglighet utan krävande infrastruktur. Genom supervised fine-tuning av mindre modeller med nya data-set, SQUAD-ei och SQUAD-GPT, uppnådde den resulterande modellen, KERMIT-7B, överlägsen pre-standa i TYDIQA-GoldP, vilket visar förbättrad informationsutvinning samtidigt som den generativa kva-liteten bibehålls.
Place, publisher, year, edition, pages
2024. , p. 124
Keywords [en]
Keywords: KERMIT-7B, SQUAD-ei, SQUAD-GPT, Artificial Intelligence (AI), Large Language Models (LLMs), Small Language Models (SLMs), Supervised Fine-tuning, Information Extraction.
Keywords [sv]
KERMIT-7B, SQUAD-ei, SQUAD-GPT, Artificiell intelligens (AI), stora språkmodeller (LLM), små språkmodeller (SLM), övervakad finjustering, informationsutvinning.
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:hig:diva-44763OAI: oai:DiVA.org:hig-44763DiVA, id: diva2:1872915
External cooperation
Research Institutes of Sweden
Subject / course
Computer science
Educational program
Study Programme in Computer Science
Supervisors
Examiners
2024-06-192024-06-182025-02-07Bibliographically approved