hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Mining Medieval Documents by Word Spotting
Uppsala universitet, Centrum för bildanalys.
Uppsala universitet, Institutionen för lingvistik och filologi.
Uppsala universitet, Institutionen för nordiska språk.ORCID iD: 0000-0001-5072-4961
Uppsala universitet, Centrum för bildanalys.ORCID iD: 0000-0002-4405-6888
2011 (English)In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, New York: ACM , 2011, p. 75-82Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents novel results for word spotting based on dynamic time warping applied to medieval manuscripts in Latin and Old Swedish. A target word is marked by a user, and the method automatically finds similar word forms in the document by matching them against the target. The method automatically identifies pages and lines. We show that our method improves accuracy compared to earlier proposals for this kind of handwriting. An advantage of the new method is that it performs matching within a text line without presupposing that the difficult problem of segmenting the text line into individual words has been solved. We evaluate our word spotting implementation on two medieval manuscripts representing two script types. We also show that it can be useful by helping a user find words in a manuscript and present graphs of word statistics as a function of page number.

Place, publisher, year, edition, pages
New York: ACM , 2011. p. 75-82
National Category
Humanities and the Arts Natural Language Processing
Research subject
Computational Linguistics; Computerized Image Processing
Identifiers
URN: urn:nbn:se:hig:diva-25267DOI: 10.1145/2037342.2037355ISBN: 978-1-4503-0916-5 (print)OAI: oai:DiVA.org:hig-25267DiVA, id: diva2:1141929
Conference
Workshop on Historical Document Imaging and Processing, 16-17 Sep 2011, Beijing, China
Available from: 2017-09-18 Created: 2017-09-18 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textFulltext

Authority records

Mårtensson, LasseBrun, Anders

Search in DiVA

By author/editor
Mårtensson, LasseBrun, Anders
Humanities and the ArtsNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 87 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf