Start, Follow, Read: End-to-End Full-Page Handwriting Recognition

Despite decades of research, offline handwriting recognition (HWR) of degraded historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation. Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations. Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of text lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) text into dewarped images suitable for recognition by a CNN-LSTM network. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[5]  Théodore Bluche,et al.  Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition , 2016, NIPS.

[6]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[7]  Sanchez Joan Andreu,et al.  ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset , 2016 .

[8]  Christian Wolf,et al.  Recognition : Learning Where to Start and When to Stop , 2017 .

[9]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[10]  Robert M. Haralick,et al.  Document page decomposition by the bounding-box project , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[12]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[13]  Volkmar Frinken,et al.  Neural network language models for off-line handwriting recognition , 2014, Pattern Recognition.

[14]  Basilios Gatos,et al.  cBAD: ICDAR2017 Competition on Baseline Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[16]  William A. Barrett,et al.  Data Augmentation for Recognition of Handwritten Words and Lines Using a CNN-LSTM Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[18]  Andy C. Downton,et al.  User-assisted archive document image analysis for digital library construction , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[19]  Jérôme Louradour,et al.  Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Jihad El-Sana,et al.  Language-Independent Text Lines Extraction Using Seam Carving , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[22]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ariel Shamir,et al.  Seam Carving for Content-Aware Image Resizing , 2007, ACM Trans. Graph..

[24]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Radu Ioanitescu,et al.  Handwritten Documents Text Line Segmentation based on Information Energy , 2014, Int. J. Comput. Commun. Control.

[26]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Alejandro Héctor Toselli,et al.  ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[28]  Chris Tensmeyer,et al.  Convolutional Neural Networks for Font Classification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[29]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Volkmar Frinken,et al.  Handwriting recognition in historical documents using very large vocabularies , 2013, HIP '13.

[31]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[32]  Hermann Ney,et al.  Open vocabulary handwriting recognition using combined word-level and character-level language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.