hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
KERMIT: Knowledge Extractive and Reasoning Model usIng Transformers
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Sustainable development
The essay/thesis is partially on sustainable development according to the University's criteria
Abstract [en]

In the rapidly advancing field of artificial intelligence, Large Language Models (LLMs) like GPT-3, GPT-4, and Gemini have revolutionized sectors by automating complex tasks. Despite their advancements, LLMs and more noticeably smaller language models (SLMs) still face challenges, such as generating unfounded content "hallucinations." This project aims to enhance SLMs for broader accessibility without extensive computational infrastructure. By supervised fine-tuning of smaller models with new datasets, SQUAD-ei and SQUAD-GPT, the resulting model, KERMIT-7B, achieved superior performance in TYDIQA-GoldP, demonstrating improved information extraction while retaining generative quality.

Abstract [sv]

Inom det snabbt växande området artificiell intelligens har stora språkmodeller (LLM) som GPT-3, GPT-4 och Gemini revolutionerat sektorer genom att automatisera komplexa uppgifter. Trots sina framsteg stårdessa modeller, framför allt mindre språkmodeller (SLMs) fortfarande inför utmaningar, till exempel attgenerera ogrundat innehåll "hallucinationer". Denna studie syftar till att förbättra SLMs för bredare till-gänglighet utan krävande infrastruktur. Genom supervised fine-tuning av mindre modeller med nya data-set, SQUAD-ei och SQUAD-GPT, uppnådde den resulterande modellen, KERMIT-7B, överlägsen pre-standa i TYDIQA-GoldP, vilket visar förbättrad informationsutvinning samtidigt som den generativa kva-liteten bibehålls.

Place, publisher, year, edition, pages
2024. , p. 124
Keywords [en]
Keywords: KERMIT-7B, SQUAD-ei, SQUAD-GPT, Artificial Intelligence (AI), Large Language Models (LLMs), Small Language Models (SLMs), Supervised Fine-tuning, Information Extraction.
Keywords [sv]
KERMIT-7B, SQUAD-ei, SQUAD-GPT, Artificiell intelligens (AI), stora språkmodeller (LLM), små språkmodeller (SLM), övervakad finjustering, informationsutvinning.
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:hig:diva-44763OAI: oai:DiVA.org:hig-44763DiVA, id: diva2:1872915
External cooperation
Research Institutes of Sweden
Subject / course
Computer science
Educational program
Study Programme in Computer Science
Supervisors
Examiners
Available from: 2024-06-19 Created: 2024-06-18 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(5279 kB)215 downloads
File information
File name FULLTEXT01.pdfFile size 5279 kBChecksum SHA-512
a64f9a31f183690444f10bf0c8dd51ebc8944cdc8fc4d10833da612dc90faa1531ad8cc26baf66dc18f955bad1843dae63b553309b980f4e87cf5c9387202a60
Type fulltextMimetype application/pdf

By organisation
Computer Science
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 215 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 255 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf