Large scale continuous dating of medieval scribes using a combined image and language model
2016 (English)In: Proceedings - 12th IAPR International Workshop on Document Analysis Systems, DAS 2016, 2016, p. 48-53, article id 7490092Conference paper, Published paper (Refereed)
Abstract [en]
Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection "Svenskt Diplomatariums huvudkartotek" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.
Place, publisher, year, edition, pages
2016. p. 48-53, article id 7490092
Keywords [en]
Character recognition, Computational linguistics, Modeling languages, Students, Absolute error, Combined images, Estimation schemes, Historical research, Image data, Language model, Medieval manuscript, Statistical modeling, Transcription
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:hig:diva-22524DOI: 10.1109/DAS.2016.71ISI: 000390411200009Scopus ID: 2-s2.0-84979500480OAI: oai:DiVA.org:hig-22524DiVA, id: diva2:975118
Conference
12th IAPR International Workshop on Document Analysis Systems, DAS 2016, 11-14 April 2016, Santorini, Greece
2016-09-282016-09-282022-09-09Bibliographically approved