Viral transcript alignment

We present an end-to-end system for aligning transcript letters to their coordinates in a manuscript image. An intuitive GUI and an automatic line detection method enable the user to perform an exact alignment of parts of document pages. In order to bridge large regions in between annotation, and augment the manual effort, the system employs an optical-flow engine for directly matching at the pixel level the image of a line of a historical text with a synthetic image created from the transcript's matching line. Meanwhile, by accumulating aligned letters, and performing letter spotting, the system is able to bootstrap a rapid semi-automatic transcription of the remaining text. Thus, the amount of manual work is greatly diminished and the transcript alignment task becomes practical regardless of the corpus size.

[1]  Venu Govindaraju,et al.  Transcript mapping for handwritten Arabic documents , 2007, Electronic Imaging.

[2]  Ioannis Pratikakis,et al.  Segmentation-free Word Spotting in Historical Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[3]  R. Manmatha,et al.  Aligning Transcripts to Automatically Segmented Handwritten Manuscripts , 2006, Document Analysis Systems.

[4]  Fei Yin,et al.  Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[5]  Alejandro Héctor Toselli Rossi,et al.  Fast HMM-Filler Approach for Key Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[7]  Ernest Valveny,et al.  Handwritten Word Spotting with Corrected Attributes , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[9]  R. Manmatha,et al.  Scale Space Technique for Word Segmentation in Handwritten Documents , 1999, Scale-Space.

[10]  Basilios Gatos,et al.  Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with Text-Image Alignment , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[11]  Lambert Schomaker,et al.  Text-image alignment for historical handwritten documents , 2009, Electronic Imaging.

[12]  Alejandro Héctor Toselli Rossi,et al.  Word-Graph and Character-Lattice Combination for KWS in Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[13]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Masaki Nakagawa,et al.  Recognition by Improving Segmentation Quality , 2008 .

[15]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[16]  James Allan,et al.  Text alignment with handwritten documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[17]  Lewis D. Griffin,et al.  Multiscale Histogram of Oriented Gradient Descriptors for Robust Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Masaki Nakagawa,et al.  A Candidate Lattice Refinement Method for Online Handwritten Japanese Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[20]  Horst Bunke,et al.  Automatic segmentation of the IAM off-line database for handwritten English text , 2002, Object recognition supported by user interaction for service robots.

[21]  Lior Wolf,et al.  Identifying Join Candidates in the Cairo Genizah , 2011, International Journal of Computer Vision.

[22]  John D. Hobby,et al.  Matching document images with ground truth , 1998, International Journal on Document Analysis and Recognition.

[23]  Venu Govindaraju,et al.  Transcript mapping for handwritten English documents , 2008, Electronic Imaging.

[24]  Joel Z. Leibo,et al.  Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines? , 2013, ArXiv.

[25]  Tal Hassner,et al.  OCR-Free Transcript Alignment , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[27]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Gernot A. Fink,et al.  Bag-of-Features HMMs for Segmentation-Free Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[29]  Sargur N. Srihari,et al.  Mapping Transcripts to Handwritten Text , 2006 .

[30]  Véronique Eglin,et al.  Learning-Free Text-Image Alignment for Medieval Manuscripts , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[31]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[32]  Verónica Romero,et al.  Handwritten text recognition for historical documents in the transcriptorium project , 2014, DATeCH '14.

[33]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[34]  F. Perronnin,et al.  Local gradient histogram features for word spotting in unconstrained handwritten documents , 2008 .

[35]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[36]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[38]  Alejandro Héctor Toselli,et al.  Alignment between Text Images and their Transcripts for Handwritten Documents , 2011, Language Technology for Cultural Heritage.

[39]  Andreas Keller,et al.  HMM-based Word Spotting in Handwritten Documents Using Subword Models , 2010, 2010 20th International Conference on Pattern Recognition.