hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Källgranskning med RAG och småspråkmodeller
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
2025 (Swedish)Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesis
Sustainable development
The essay/thesis is mainly on sustainable development according to the University's criteria
Abstract [sv]

Detta examensarbete undersöker möjligheten att automatisera källgranskning av studenters akademiska texter med hjälp av Retrieval-Augmented Generation (RAG) ikombination med en liten språkmodell (SLM). Problemet som adresseras är dentidskrävande uppgiften att manuellt kontrollera om studenters påståenden stöds avangivna referenser vilket är en process som ofta saknar systematiskt stöd.En prototyp av ett källgranskningssystem har utvecklats i Python med Streamlit ochimplementerar en lokal RAG-arkitektur. Systemet använder Gemma 3:4b somspråkmodell via Ollama och FAISS som vektordatabas. Textsegmentering och embeddings hanteras av LangChain komponenter. För att förbättra precisionen i retrieval delen har olika kombinationer av chunk size, overlap och cosine similarity-tröskeltestats.Systemet har utvärderats genom att jämföra dess bedömningar med mänskliga bedömare och tre stora språkmodeller (ChatGPT o4, DeepSeek-R1, Gemini 2.5 Flash). Itestfall med antingen fullständig eller obefintlig källöverensstämmelse uppvisade systemet en stark överensstämmelse med mänskliga bedömningar, med en genomsnittlig avvikelse under ett betygssteg. Vid gränsfall tenderade systemet att vara något mer generöst.För att utvärdera användbarheten genomfördes även en System Usability Scale enkät(SUS), där resultatet visade ett genomsnittligt användbarhetsbetyg på 77.5 vilketklassas som bra.Slutsatsen är att ett RAG-baserat system som använder SLM kan ge träffsäkra bedömningar av källstöd, även på hårdvaror med begränsade resurser. Systemet kanutgöra ett relevant stöd för lärare i utbildningssammanhang och visar potential förvidareutveckling inom AI-baserad akademisk granskning.

Abstract [en]

This undergraduate thesis explores the development of an AI-based source-checkingsystem for student academic writing, utilizing a combination of Retrieval-Augmented Generation (RAG) and a Small Language Model (SLM). The project addresses the challenge of manually verifying whether student claims are supported bycited references, which is often a time-consuming task in higher education.A prototype system was implemented in Python using Streamlit and LangChain, integrating a locally run Gemma 3:4b language model via Ollama and a FAISS vectordatabase. Texts from uploaded reference PDFs are segmented into chunks, embedded using paraphrase-multilingual-MiniLM-L12-v2, and indexed for retrieval. Several experiments were conducted to optimize key RAG parameters, including chunksize, chunk overlap, and cosine similarity threshold.The system's performance was evaluated against human reviewer assessments andthree large language models (ChatGPT o4, DeepSeek-R1, Gemini 2.5 Flash). Itdemonstrated strong agreement with human judgments in cases of clear factual support or lack thereof, with an average deviation of less than one grade step. Some discrepancies occurred in borderline cases, where the system tended to be more lenient.A usability study using the System Usability Scale (SUS) yielded an average score of77.5, indicating good usability and acceptance among testers.The findings suggest that a RAG-based system powered by an SLM can offer reliablefactual assessments while being deployable on resource-constrained hardware. Thesystem holds promise as a support tool in educational settings and provides a strongfoundation for future work in AI-assisted academic verification.

Place, publisher, year, edition, pages
2025. , p. 46
Keywords [sv]
AI, RAG, LLM, SLM, retriever augmented generation, small language model, large language model, källgranskning, gemma3, language model
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hig:diva-47778OAI: oai:DiVA.org:hig-47778DiVA, id: diva2:1978994
Subject / course
Computer science
Educational program
Computer enginering/Electrical engineering – Internet based
Presentation
2025-06-11, 99131, Kungsbäcksvägen 47, Gävle, 15:40 (Swedish)
Supervisors
Examiners
Available from: 2025-07-01 Created: 2025-06-29 Last updated: 2025-10-02Bibliographically approved

Open Access in DiVA

fulltext(1870 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 1870 kBChecksum SHA-512
96b9f1095aefa30363492ef273f42de0c1d89aea4edf9bb1dc5b62f3b92b86bcdfcbee06d1cb1524495a5cce81a84b638a283158e68a8539c6e1b5741630a95e
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Karkoush, JosephAli, Mohammed
By organisation
Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 39 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 113 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf