Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition

In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore reduces the difficulty of symbol segmentation and recognition via the decoder with attention mechanism. For multi-modal HMER, other than fusing multi-modal information in decoder, SCAN can also fuse multi-modal information in encoder by utilizing the stroke based alignments between online and offline modalities. The encoder fusion is a better way for combining multi-modal information as it implements the information interaction one step before the decoder fusion so that the advantages of multiple modalities can be exploited earlier and more adequately when training the encoder-decoder model. Evaluated on a benchmark published by CROHME competition, the proposed SCAN achieves the state-of-the-art performance.

[1]  Harold Mouchère,et al.  A global learning approach for an online handwritten mathematical expression recognition system , 2014, Pattern Recognit. Lett..

[2]  Jun Du,et al.  Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition , 2019, IEEE Transactions on Multimedia.

[3]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[4]  Jun Du,et al.  Multi-modal Attention Network for Handwritten Mathematical Expression Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[5]  Jun Du,et al.  A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[6]  Paul A. Viola,et al.  Ambiguity and Constraint in Mathematical Expression Recognition , 1998, AAAI/IAAI.

[7]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Théodore Bluche,et al.  Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition , 2016, NIPS.

[10]  Juan Carlos Niebles,et al.  Leveraging Video Descriptions to Learn Video Question Answering , 2016, AAAI.

[11]  S. Moss Listen , 2017 .

[12]  Zhou Yu,et al.  Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Umapada Pal,et al.  Sub-Stroke-Wise Relative Feature for Online Indic Handwriting Recognition , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[14]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[15]  Umapada Pal,et al.  Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network , 2018, Inf. Fusion.

[16]  Yixin Chen,et al.  SHOW , 2018, Silent Cinema.

[17]  Richard Zanibbi,et al.  Recognizing Mathematical Expressions Using Tree Transformation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Jérôme Louradour,et al.  Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[19]  Harold Mouchère,et al.  ICFHR2016 CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[20]  Manfred K. Lang,et al.  A soft-decision approach for symbol segmentation within handwritten mathematical expressions , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[25]  Jun Tan,et al.  Residual BiRNN Based Seq2Seq Model with Transition Probability Matrix for Online Handwritten Mathematical Expression Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[26]  Harold Mouchère,et al.  ICFHR 2014 Competition on Recognition of On-Line Handwritten Mathematical Expressions (CROHME 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[27]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dit-Yan Yeung,et al.  PenCalc: a novel application of on-line mathematical expression recognition technology , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[29]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[31]  Harold Mouchère,et al.  ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[32]  Jun Du,et al.  Radical analysis network for learning hierarchies of Chinese characters , 2020, Pattern Recognit..

[33]  Sherif Abdou,et al.  Beam search pruning in speech recognition using a posterior probability-based confidence measure , 2004, Speech Commun..

[34]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[35]  V. Hlavac,et al.  Mathematical Formulae Recognition Using 2D Grammars , 2007 .

[36]  Bipin Indurkhya,et al.  Pattern Generation Strategies for Improving Recognition of Handwritten Mathematical Expressions , 2019, Pattern Recognit. Lett..

[37]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Di He,et al.  Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[39]  Jean Oh,et al.  Attention-based Multimodal Neural Machine Translation , 2016, WMT.

[40]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[41]  Qin Jin,et al.  Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[42]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[43]  Andreas Noll,et al.  A data-driven organization of the dynamic programming beam search for continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[45]  Jean Paul Haton,et al.  A Syntactic Approach for Handwritten Mathematical Formula Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[47]  Raúl Rojas,et al.  Recognition of on-line handwritten mathematical formulas in the E-chalk system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[48]  Umapada Pal,et al.  Overwriting repetition and crossing-out detection in online handwritten text , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[49]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[50]  Thanh-Toan Do,et al.  Compact Trilinear Interaction for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  WATCH , 2004 .

[52]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Joan-Andreu Sánchez,et al.  An integrated grammar-based approach for mathematical expression recognition , 2016, Pattern Recognit..

[54]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[55]  Jing Xu,et al.  Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Yang Wang,et al.  Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Umapada Pal,et al.  Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[58]  Harold Mouchère,et al.  Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014 , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[59]  W. Paczkowski Track , 2020, Deep Data Analytics for New Product Development.

[60]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[62]  Harold Mouchère,et al.  Handwritten and Audio Information Fusion for Mathematical Symbol Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[63]  Partha Pratim Roy,et al.  Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network , 2018, Pattern Recognit..

[64]  Shiliang Zhang,et al.  Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition , 2017, Pattern Recognit..

[65]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[66]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[67]  Fei Yin,et al.  Image-to-Markup Generation via Paired Adversarial Learning , 2018, ECML/PKDD.

[68]  Jin Hyung Kim,et al.  Efficient search strategy in structural analysis for handwritten mathematical expression recognition , 2009, Pattern Recognit..