The RODRIGO Database

Annotation of digitized pages from historical document collections is very important to research on automatic extraction of text blocks, lines, and handwriting recognition. We have recently introduced a new handwritten text database, GERMANA, which is based on a Spanish manuscript from 1891. To our knowledge, GERMANA is the first publicly available database mostly written in Spanish and comparable in size to standard databases. In this paper, we present another handwritten text database, RODRIGO, completely written in Spanish and comparable in size to GERMANA. However, RODRIGO comes from a much older manuscript, from 1545, where the typical difficult characteristics of historical documents are more evident. In particular, the writing style, which has clear Gothic influences, is significantly more complex than that of GERMANA. We also provide baseline results of handwriting recognition for reference in future studies, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.

[1]  Ernest Valveny,et al.  Interactive-predictive detection of handwritten text blocks , 2010, Electronic Imaging.

[2]  Alfons Juan-Císcar,et al.  Balancing error and supervision effort in interactive-predictive handwriting recognition , 2010, IUI '10.

[3]  Horst Bunke,et al.  Hidden Markov model-based ensemble methods for offline handwritten text line recognition , 2008, Pattern Recognit..

[4]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[5]  Agustín Millares Carlo,et al.  Tratado de paleografía española , 1983 .

[6]  Likforman-SulemLaurence,et al.  Text line segmentation of historical documents: a survey , 2007 .

[7]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Alfons Juan-Císcar,et al.  The GERMANA Database , 2009, 2009 10th International Conference on Document Analysis and Recognition.