hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Large scale continuous dating of medieval scribes using a combined image and language model
Department of Information Technology, Uppsala University, Sweden .
University of Gävle, Faculty of Education and Business Studies, Department of Humanities, Swedish and Gender studies.
Department of Information Technology, Uppsala University, Sweden .
2016 (English)In: Proceedings - 12th IAPR International Workshop on Document Analysis Systems, DAS 2016, 2016, 48-53 p., 7490092Conference paper, Published paper (Refereed)
Abstract [en]

Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection "Svenskt Diplomatariums huvudkartotek" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing. 

Place, publisher, year, edition, pages
2016. 48-53 p., 7490092
Keyword [en]
Character recognition, Computational linguistics, Modeling languages, Students, Absolute error, Combined images, Estimation schemes, Historical research, Image data, Language model, Medieval manuscript, Statistical modeling, Transcription
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:hig:diva-22524DOI: 10.1109/DAS.2016.71ISI: 000390411200009Scopus ID: 2-s2.0-84979500480OAI: oai:DiVA.org:hig-22524DiVA: diva2:975118
Conference
12th IAPR International Workshop on Document Analysis Systems, DAS 2016, 11-14 April 2016, Santorini, Greece
Available from: 2016-09-28 Created: 2016-09-28 Last updated: 2017-02-06Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Mårtensson, Lasse
By organisation
Swedish and Gender studies
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 236 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf