A Novel Transcript Mapping Technique for Handwritten Document Images

Transcript mapping refers to the process of aligning meaningful units of a handwritten document image (e.g. Text lines, words, characters) with the corresponding transcription information. It has many applications such as (i) fast generation of ground truth at different granularity levels and (ii) indexing handwritten collections for document retrieval. In this paper, a novel transcript mapping technique is proposed which is guided by the number of words as well as the characters per word of a text line. The proposed method combines the results of a local and a global approach using a scoring algorithm. The efficiency of the proposed method is demonstrated by experimentation conducted on a known, publicly available dataset, achieving word level alignment accuracy of 99.48%.

[1]  Venu Govindaraju,et al.  Transcript mapping for handwritten Arabic documents , 2007, Electronic Imaging.

[2]  Fei Yin,et al.  Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context , 2013, Pattern Recognit..

[3]  Bin Zhang,et al.  Transcript mapping for historic handwritten document images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[4]  R. Manmatha,et al.  Aligning Transcripts to Automatically Segmented Handwritten Manuscripts , 2006, Document Analysis Systems.

[5]  Sargur N. Srihari,et al.  Mapping Transcripts to Handwritten Text , 2006 .

[6]  Georgios Louloudis,et al.  ICDAR 2009 Handwriting Segmentation Contest , 2009, ICDAR.

[7]  Vassilis Katsouros,et al.  Handwritten document image segmentation into text lines and words , 2010, Pattern Recognit..

[8]  Horst Bunke,et al.  Automatic segmentation of the IAM off-line database for handwritten English text , 2002, Object recognition supported by user interaction for service robots.

[9]  Alejandro Héctor Toselli,et al.  Viterbi Based Alignment between Text Images and their Transcripts , 2007, LaTeCH@ACL 2007.

[10]  James Allan,et al.  Text alignment with handwritten documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[11]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[12]  Basilios Gatos,et al.  Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with Text-Image Alignment , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[13]  Lambert Schomaker,et al.  Text-image alignment for historical handwritten documents , 2009, Electronic Imaging.

[14]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[15]  Juergen Luettin,et al.  A new normalization technique for cursive handwritten words , 2001, Pattern Recognit. Lett..

[16]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.