Field Typing for Improved Recognition on Heterogeneous Handwritten Forms

Offline handwriting recognition has undergone continuous progress over the past decades. However, existing methods are typically benchmarked on free-form text datasets that are biased towards good-quality images and handwriting styles, and homogeneous content. In this paper, we show that state-of-the-art algorithms, employing long short-term memory (LSTM) layers, do not readily generalize to real-world structured documents, such as forms, due to their highly heterogeneous and out-of-vocabulary content, and to the inherent ambiguities of this content. To address this, we propose to leverage the content type within an LSTM-based architecture. Furthermore, we introduce a procedure to generate synthetic data to train this architecture without requiring expensive manual annotations. We demonstrate the effectiveness of our approach at transcribing text on a challenging, real-world dataset of European Accident Statements.

[1]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  C. V. Jawahar,et al.  Generating Synthetic Data for Text Recognition , 2016, ArXiv.

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[6]  Andreas Dengel,et al.  A Tesseract-based OCR framework for historical documents lacking ground-truth text , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[7]  Lei Sun,et al.  A CNN-Based Approach to Detecting Text from Images of Whiteboards and Handwritten Notes , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[9]  Théodore Bluche,et al.  Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[10]  Marçal Rusiñol,et al.  Automatic Static/Variable Content Separation in Administrative Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lei Sun,et al.  A Robust Approach to Detecting Text from Images of Whiteboards and Handwritten Notes , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[13]  Sébastien Eskenazi,et al.  A comprehensive survey of mostly textual document segmentation algorithms since 2008 , 2017, Pattern Recognit..

[14]  C. V. Jawahar,et al.  Improving CNN-RNN Hybrid Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Christian Wolf,et al.  Recognition : Learning Where to Start and When to Stop , 2017 .

[17]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[18]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[19]  Premkumar Natarajan,et al.  Combining Convolutional Neural Networks and LSTMs for Segmentation-Free OCR , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[21]  Frédéric Kaplan,et al.  Comparing human and machine performances in transcribing 18th century handwritten Venetian script , 2018, DH.

[22]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Andreas Dengel,et al.  Classification and Information Extraction for Complex and Nested Tabular Structures in Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[25]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[26]  Emmanuel Augustin,et al.  RIMES evaluation campaign for handwritten mail processing , 2006 .

[27]  Premkumar Natarajan,et al.  Combining deep learning and language modeling for segmentation-free OCR from raw pixels , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[28]  Qiang Huo,et al.  A study on effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[29]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[30]  Premkumar Natarajan,et al.  Implicit Language Model in LSTM for OCR , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).