Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Recognition of handwritten mathematical expressions (MEs) is an important problem that has wide applications in practice. Handwritten ME recognition is challenging due to the variety of writing styles and ME formats. As a result, recognizers trained by optimizing the traditional supervision loss do not perform satisfactorily. To improve the robustness of the recognizer with respect to writing styles, in this work, we propose a novel paired adversarial learning method to learn semantic-invariant features. Specifically, our proposed model, named PAL-v2, consists of an attention-based recognizer and a discriminator. During training, handwritten MEs and their printed templates are fed into PAL-v2 simultaneously. The attention-based recognizer is trained to learn semantic-invariant features with the guide of the discriminator. Moreover, we adopt a convolutional decoder to alleviate the vanishing and exploding gradient problems of RNN-based decoder, and further, improve the coverage of decoding with a novel attention method. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of the method and achieved state-of-the-art performance.

[1]  Harold Mouchère,et al.  Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014 , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[2]  Richard Zanibbi,et al.  Recognition and retrieval of mathematical expressions , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Harold Mouchère,et al.  ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[4]  Kevin Duh,et al.  Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016, EMNLP 2016.

[5]  Wenju Liu,et al.  Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data , 2018, Pattern Recognit. Lett..

[6]  Yang Liu,et al.  Synthetically Supervised Feature Learning for Scene Text Recognition , 2018, ECCV.

[7]  George Labahn,et al.  A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[11]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[14]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[15]  Alexander M. Rush,et al.  What You Get Is What You See: A Visual Markup Decompiler , 2016, ArXiv.

[16]  Masaki Nakagawa,et al.  Training an End-to-End System for Handwritten Mathematical Expression Recognition by Generated Patterns , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[18]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[19]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[20]  Jun Du,et al.  Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[21]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[22]  Jun Du,et al.  Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition , 2019, IEEE Transactions on Multimedia.

[23]  ÁlvaroFrancisco,et al.  An integrated grammar-based approach for mathematical expression recognition , 2016 .

[24]  Li Liu,et al.  SCAN: Sliding Convolutional Attention Network for Scene Text Recognition , 2018, ArXiv.

[25]  Shiliang Zhang,et al.  Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition , 2017, Pattern Recognit..

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[28]  Harold Mouchère,et al.  A global learning approach for an online handwritten mathematical expression recognition system , 2014, Pattern Recognit. Lett..

[29]  Jun Du,et al.  A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[30]  Dit-Yan Yeung,et al.  Error detection, error correction and performance evaluation in on-line mathematical expression recognition , 2001, Pattern Recognit..

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Karl Stratos,et al.  Large Scale Retrieval and Generation of Image Descriptions , 2015, International Journal of Computer Vision.

[34]  Harold Mouchère,et al.  Handwritten Mathematical Expressions , 2018 .

[35]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[37]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[40]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[41]  Sheng Tang,et al.  Image Caption with Global-Local Attention , 2017, AAAI.

[42]  Feng Tian,et al.  Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[45]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[46]  Fei Yin,et al.  Image-to-Markup Generation via Paired Adversarial Learning , 2018, ECML/PKDD.

[47]  Joan-Andreu Sánchez,et al.  An integrated grammar-based approach for mathematical expression recognition , 2016, Pattern Recognit..

[48]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[49]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[50]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[51]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[52]  Alexander G. Schwing,et al.  Convolutional Image Captioning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.