Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning

Handwritten Word Recognition and Spotting is a challenging field dealing with handwritten text possessing irregular and complex shapes. The design of deep neural network models makes it necessary to extend training datasets in order to introduce variations and increase the number of samples; word-retrieval is therefore very difficult in low-resource scripts. Much of the existing literature comprises preprocessing strategies which are seldom sufficient to cover all possible variations. We propose an Adversarial Feature Deformation Module (AFDM) that learns ways to elastically warp extracted features in a scalable manner. The AFDM is inserted between intermediate layers and trained alternatively with the original framework, boosting its capability to better learn highly informative features rather than trivial ones. We test our meta-framework, which is built on top of popular word-spotting and word-recognition frameworks and enhanced by AFDM, not only on extensive Latin word datasets but also on sparser Indic scripts. We record results for varying sizes of training data, and observe that our enhanced network generalizes much better in the low-data regime; the overall word-error rates and mAP scores are observed to improve as well.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Jérôme Louradour,et al.  Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3]  Minh N. Do,et al.  Semantic Image Inpainting with Perceptual and Contextual Losses , 2016, ArXiv.

[4]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Haikal El Abed,et al.  ICDAR 2011 - French Handwriting Recognition Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Anders Brun,et al.  Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Prasenjit Dey,et al.  HMM-based Indic handwritten word recognition using zone segmentation , 2016, Pattern Recognit..

[18]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[19]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[23]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[24]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[27]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[29]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[33]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[34]  Gernot A. Fink,et al.  Attribute CNNs for word spotting in handwritten documents , 2017, International Journal on Document Analysis and Recognition (IJDAR).

[35]  Lianwen Jin,et al.  High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[36]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[37]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[38]  Iasonas Kokkinos,et al.  Fracking Deep Convolutional Image Descriptors , 2014, ArXiv.

[39]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[40]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[41]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[44]  C. V. Jawahar,et al.  Word Spotting and Recognition Using Deep Embedding , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[45]  Lianwen Jin,et al.  A Multi-Object Rectified Attention Network for Scene Text Recognition , 2019, Pattern Recognit..

[46]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[47]  Dan Ciresan,et al.  Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2013, 2015 International Joint Conference on Neural Networks (IJCNN).

[48]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[49]  Yike Guo,et al.  Unsupervised Image-to-Image Translation with Generative Adversarial Networks , 2017, ArXiv.

[50]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[52]  Bidyut Baran Chaudhuri,et al.  Synthetic data generation for Indic handwritten text recognition , 2018, ArXiv.

[53]  Lior Wolf,et al.  Confidence Prediction for Lexicon-Free OCR , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54]  Umapada Pal,et al.  Cross-language Framework for Word Recognition and Spotting of Indic Scripts , 2017, Pattern Recognit..

[55]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[56]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).