hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Assessing Gemini and GPT on their ability to classify sentiment in video game reviews
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Computer and Geospatial Sciences, Computer Science.
2025 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This study evaluates the sentiment and “helpfulness” opinion classification capabilities of two state-of-the-art Large Language Models (LLMs), GPT-4o-mini and Gemini-2.0-Flash, within the context of user-generated video game reviews. We created a unique dataset containing 10000 annotated game reviews, retrieved from the Steam platform. Each review contained both a sentiment rating (positive/negative) and a perceived helpfulness rating (helpful/unhelpful/fun). Using a multi-task prompting approach and a predefined system instruction, both models were prompted across 30000 instances. Evaluation metrics included accuracy, F1-score, and Krippendorff’s alpha to assess model consistency and inter-model agreement. 

Results revealed high sentiment classification accuracy (GPT: 93%, Gemini: 94%), and lower performance on opinion classification, particularly in negatively labeled reviews. The findings suggest that while these LLMs are well-suited for binary sentiment classification in game reviews, results regarding perceived helpfulness prediction is a more complex task that requires further studies to get a more conclusive evaluation. 

The results have practical significance for applications that want to utilize these LLMs in, for example, automated content review.

Abstract [sv]

Denna studie utvärderar förmågan hos två avancerade Large Language Models (LLMs), GPT-4o-mini och Gemini-2.0-Flash, att klassificera sentiment och en upplevd hjälpsamhet åsikt i användargenererade spelrecensioner. Vi skapade ett unikt dataset som innehöll 10000 annoterade spelrecensioner, som hämtades från plattformen Steam. Varje recension innehöll både en sentimental klassificering (positivt/negativt) och en upplevd hjälpsamhets klassificering (hjälpsam/icke-hjälpsam/rolig). Genom att använda en multi-task prompting-metod och en fördefinierad system-instruktion testades båda modellerna på totalt 30000 fall.

Utvärderings-metoderna inkluderade noggrannhet (accuracy), F1-score och Krippendorffs alfa för att mäta modellernas konsistens och överensstämmelse. Resultaten visade hög noggrannhet i sentiment klassificeringen (GPT: 93 %, Gemini: 94 %), men lägre precision i åsikts klassificeringen, särskilt för negativt märkta recensioner. Slutsatserna pekar på dessa LLM:er är väl lämpade för binär sentiments klassificering i spelrecensioner, men att klassificering av upplevd hjälpsamhet är en mer komplex uppgift som kräver vidare forskning med en mer utarbetad metodologi som leder till en mer definitiv utvärdering.

Resultaten har praktisk betydelse för tillämpningar med dessa LLM:er inom exempelvis automatiserad innehållsgranskning.

Place, publisher, year, edition, pages
2025. , p. 36
Keywords [en]
Large Language Models (LLMs), Gemini-2.0-flash, GPT-4o-mini, Sentiment Analysis, Video Game Reviews, User-Generated Content, Prompt Engineering, Zero-shot learning, Multitask
Keywords [sv]
Large Language Models (LLMs), Gemini-2.0-flash, GPT-4o-mini, Sentimentanalys, Spelrecensioner, Användargenererat Innehåll, Prompt Engineering, Zero-shot learning, Multitask
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hig:diva-47499OAI: oai:DiVA.org:hig-47499DiVA, id: diva2:1973484
Subject / course
Computer science
Educational program
Study Programme in Computer Science
Supervisors
Examiners
Available from: 2025-06-23 Created: 2025-06-19 Last updated: 2025-10-02Bibliographically approved

Open Access in DiVA

fulltext(755 kB)122 downloads
File information
File name FULLTEXT01.pdfFile size 755 kBChecksum SHA-512
4ae5aed0a4901dab97fca6f98caa8703c6e04cb7983649da88e452d3d17d5535ef27e4af375cb559440c629ee74140ca2e02d03553948ce9cc266c8c034a544b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hamner, SimonBrinkenstråhle Dahlin, Fabian
By organisation
Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 123 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 291 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf