A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification

The strength of long short-term memory neural networks (LSTMs) that have been applied is more located in handling sequences of variable length than in handling geometric variability of the image patterns. In this paper, an end-to-end convolutional LSTM neural network is used to handle both geometric variation and sequence variability. The best results for LSTMs are often based on large-scale training of an ensemble of network instances. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (convolutional neural network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently scaled input images and different feature map sizes. Three datasets are used: the standard benchmark RIMES dataset (French); a historical handwritten dataset KdK (Dutch); the standard benchmark George Washington (GW) dataset (English). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches which did not use a pre-trained network. On the KdK and GW datasets, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections.

[1]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[2]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[3]  Wonyong Sung,et al.  Character-level incremental speech recognition with recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Andy Way,et al.  No Padding Please: Efficient Neural Handwriting Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Bruno Stuner,et al.  Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Lambert Schomaker,et al.  Handwritten-Word Spotting Using Biologically Inspired Features , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[9]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  M. Stehlík,et al.  On Equidistant Designs, Symmetries and Their Violations in Multivariate Models , 2020 .

[11]  Alex Waibel,et al.  A Fast Search Technique for Large Vocabulary On-Line Handwriting Recognition , 1998 .

[12]  Bezalel Peleg,et al.  CONSISTENT VOTING SYSTEMS , 1978 .

[13]  Vasile Palade,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006, Int. J. Hybrid Intell. Syst..

[14]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Horst Bunke,et al.  Off-line cursive handwriting recognition using multiple classifier systems—on the influence of vocabulary, ensemble, and training set size , 2005 .

[16]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[17]  Joan Puigcerver I Pérez,et al.  A Probabilistic Formulation of Keyword Spotting , 2018 .

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Raymond W. Ptucha,et al.  Intelligent character recognition using fully convolutional neural networks , 2019, Pattern Recognit..

[20]  Ching Y. Suen,et al.  Automatic reading of cursive scripts using a reading model and perceptual concepts , 1998, International Journal on Document Analysis and Recognition.

[21]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[22]  B.J. Oommen,et al.  Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions , 1997, Pattern Recognit..

[23]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[24]  G. Āllport The Psycho-Biology of Language. , 1936 .

[25]  Lindsay J. Evett,et al.  Fast dictionary look-up for contextual word recognition , 1990, Pattern Recognit..

[26]  Jean-Paul van Oosten,et al.  Separability versus prototypicality in handwritten word-image retrieval , 2014, Pattern Recognit..

[27]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[28]  Emmanuel Augustin,et al.  RIMES evaluation campaign for handwritten mail processing , 2006 .

[29]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[30]  Nicholas Q. Emlen Perspectives On The Quechua–Aymara Contact Relationship And The Lexicon And Phonology Of Pre-Proto-Aymara1 , 2017, International Journal of American Linguistics.

[31]  Klaus U. Schulz,et al.  Unsupervised Learning of Edit Distance Weights for Retrieving Historical Spelling Variations , 2007 .

[32]  Youssef Bassil,et al.  OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion , 2012, ArXiv.

[33]  Verónica Bolón-Canedo,et al.  Ensemble feature selection: Homogeneous and heterogeneous approaches , 2017, Knowl. Based Syst..

[34]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[35]  Giovanni Seni,et al.  Forward search with discontinuous probabilities for online handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[36]  Giovanni Seni,et al.  Large Vocabulary Recognition of On-Line Handwritten Cursive Words , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  N SrihariSargur,et al.  Decision Combination in Multiple Classifier Systems , 1994 .

[38]  Giovanni Seni,et al.  Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study , 1996, Pattern Recognit..

[39]  Xiangang Li,et al.  Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[41]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Colin Higgins,et al.  A TREE-BASED DICTIONARY SEARCH TECHNIQUE AND COMPARISON WITH N-GRAM LETTER GRAPH REDUCTION , 1990 .

[43]  Paul Edmund Bramall,et al.  A cursive script-recognition system based , .

[44]  Lambert Schomaker,et al.  Where are the Search Engines for Handwritten Documents? , 2009 .

[45]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[46]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Danilo P. Mandic,et al.  Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005 , 2005 .

[48]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[49]  Colin Higgins,et al.  A cursive script-recognition system based on human reading models , 2005, Machine Vision and Applications.

[50]  Giovanni Seni,et al.  Non-cumulative character scoring in a forward search for online handwriting recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[51]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[52]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[53]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[54]  Christopher Kermorvant,et al.  The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition , 2012, Electronic Imaging.

[55]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[56]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[57]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[58]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[59]  Robert Sablatnig,et al.  Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[60]  Lianwen Jin,et al.  Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[61]  Simon Clematide,et al.  Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods , 2018, J. Lang. Technol. Comput. Linguistics.

[62]  Lambert Schomaker,et al.  Operational data augmentation in classifying single aerial images of animals , 2017, 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA).

[63]  Lambert Schomaker,et al.  An analysis of rotation matrix and colour constancy data augmentation in classifying images of animals , 2018, J. Inf. Telecommun..

[64]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Lambert Schomaker A large-scale field test on word-image classification in large historical document collections using a traditional and two deep-learning methods , 2019, ArXiv.

[66]  Ángel Sánchez,et al.  Offline continuous handwriting recognition using sequence to sequence neural networks , 2018, Neurocomputing.

[67]  Lambert Schomaker,et al.  Recognition of Handwritten Numerical Fields in a Large Single-Writer Historical Collection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[68]  John T. Favata Offline General Handwritten Word Recognition Using an Approximate BEAM Matching Algorithm , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Mohammad Mosleh,et al.  Persian Handwritten Digit Recognition Using Ensemble Classifiers , 2015 .

[70]  Lambert Schomaker,et al.  Image-based historical manuscript dating using contour and stroke fragments , 2016, Pattern Recognit..

[71]  Alicia Fornés,et al.  Handwriting Recognition by Attribute Embedding and Recurrent Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[72]  Juergen Luettin,et al.  A new normalization technique for cursive handwritten words , 2001, Pattern Recognit. Lett..

[73]  Bruno Stuner,et al.  Handwriting recognition using cohort of LSTM and lexicon verification with extremely large lexicon , 2016, Multimedia Tools and Applications.

[74]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[75]  Dmitri Asonov Real-Word Typo Detection , 2009, NLDB.

[76]  C. V. Jawahar,et al.  Generating Synthetic Data for Text Recognition , 2016, ArXiv.

[77]  Peter Willett,et al.  Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..

[78]  Hermann Ney,et al.  Feature Extraction with Convolutional Neural Networks for Handwritten Word Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[79]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[80]  T. Ho A theory of multiple classifier systems and its application to visual word recognition , 1992 .

[81]  C. V. Jawahar,et al.  Improving CNN-RNN Hybrid Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[82]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Randall T. Schuh,et al.  The Linnaean system and its 250-year persistence , 2008, The Botanical Review.

[84]  Lambert Schomaker,et al.  Towards a Digital Infrastructure for Illustrated Handwritten Archives , 2018, ITN-DCH.

[85]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Lindsay J. Evett,et al.  Multiple word segmentation with interactive look-up for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[87]  Christopher Kermorvant,et al.  Hybrid word/Part-of-Arabic-Word Language Models for arabic text document recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[88]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Michał Grochowski,et al.  Data augmentation for improving deep learning in image classification problem , 2018, 2018 International Interdisciplinary PhD Workshop (IIPhDW).

[90]  Sebastiano Impedovo,et al.  More than twenty years of advancements on Frontiers in handwriting recognition , 2014, Pattern Recognit..

[91]  Horst Bunke,et al.  Ensembles of classifiers for handwritten word recognition , 2003, Document Analysis and Recognition.

[92]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[93]  Nasser Sherkat,et al.  Recognizer characterisation for combining handwriting recognition results at word level , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.