On-the-fly Historical Handwritten Text Annotation

The performance of information retrieval algorithms depends upon the availability of ground truth labels annotated by experts. This is an important prerequisite, and difficulties arise when the annotated ground truth labels are incorrect or incomplete due to high levels of degradation. To address this problem, this paper presents a simple method to perform on-the-fly annotation of degraded historical handwritten text in ancient manuscripts. The proposed method aims at quick generation of ground truth and correction of inaccurate annotations such that the bounding box perfectly encapsulates the word, and contains no added noise from the background or surroundings. This method will potentially be of help to historians and researchers in generating and correcting word labels in a document dynamically. The effectiveness of the annotation method is empirically evaluated on an archival manuscript collection from well-known publicly available datasets.

[1]  Núria Cirera,et al.  BH2M: The Barcelona Historical, Handwritten Marriages Database , 2014, 2014 22nd International Conference on Pattern Recognition.

[2]  Apostolos Antonacopoulos,et al.  Ground Truth for Layout Analysis Performance Evaluation , 2006, Document Analysis Systems.

[3]  Alicia Fornés,et al.  The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition , 2013, Pattern Recognit..

[4]  Alejandro Héctor Toselli,et al.  ICDAR2015 Competition on Keyword Spotting for Handwritten Documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[5]  Apostolos Antonacopoulos,et al.  Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Tapas Kanungo,et al.  TRUEVIZ: a groundtruth/metadata editing and visualizing toolkit for OCR , 2000, IS&T/SPIE Electronic Imaging.

[7]  Jing Lin,et al.  PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[9]  Sherif M. Yacoub,et al.  PerfectDoc: a ground truthing environment for complex documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[10]  Richard M. Davis,et al.  tranScriptorium: a european project on handwritten text recognition , 2013, ACM Symposium on Document Engineering.

[11]  Andrea Marchetti,et al.  Text Encoder and Annotator: an all-in-one Editor for Transcribing and Annotating Manuscripts with RDF , 2016, SWASH@ESWC.

[12]  Its'hak Dinstein,et al.  WebGT: An Interactive Web-Based System for Historical Document Ground Truth Generation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13]  Luc Vincent,et al.  Pink Panther: A Complete Environment For Ground-Truthing And Benchmarking Document Page Segmentation , 1998, Pattern Recognit..

[14]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  Sébastien Adam,et al.  Automatic Ground-truth Generation for Document Image Analysis and Understanding , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[16]  Marcus Liwicki,et al.  The use of Gabor features for semi-automatically generated polyon-based ground truth of historical document images , 2017, Digit. Scholarsh. Humanit..