Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

Encoder-decoder models have made great progress on handwritten mathematical expression recognition recently. However, it is still a challenge for existing methods to assign attention to image features accurately. Moreover, those encoder-decoder models usually adopt RNNbased models in their decoder part, which makes them inefficient in processing long LTEX sequences. In this paper, a transformer-based decoder is employed to replace RNN-based ones, which makes the whole model architecture very concise. Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. Compared to several methods that do not use data augmentation, experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%. Similarly, on CROHME 2016 and CROHME 2019, we improve the ExpRate by 1.92% and 2.28% respectively.

[1]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2]  Fei Yin,et al.  Graph-to-Graph: Towards Accurate and Interpretable Online Handwritten Mathematical Expression Recognition , 2021, AAAI.

[3]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Richard Zanibbi,et al.  Recognition and retrieval of mathematical expressions , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Jiajun Zhang,et al.  Synchronous Bidirectional Neural Machine Translation , 2019, TACL.

[6]  Zhi Tang,et al.  ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[7]  Dit-Yan Yeung,et al.  An efficient syntactic approach to structural analysis of on-line handwritten mathematical expressions , 2000, Pattern Recognit..

[8]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Zheng Gao,et al.  Mathematics Content Understanding for Cyberlearning via Formula Evolution Map , 2018, CIKM.

[10]  Zhi Tang,et al.  A Symbol Dominance Based Formulae Recognition Approach for PDF Documents , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[12]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[13]  Yuehan Wang,et al.  A mathematical information retrieval system based on RankBoost , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[14]  Zhi Tang,et al.  Persistence Homology for Link Prediction: An Interactive View , 2021, ArXiv.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Harold Mouchère,et al.  ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[17]  Jun Du,et al.  Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[18]  Zhi Tang,et al.  Formula Ranking within an Article , 2018, JCDL.

[19]  Xiaofei Wang,et al.  A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[20]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dit-Yan Yeung,et al.  Error detection, error correction and performance evaluation in on-line mathematical expression recognition , 2001, Pattern Recognit..

[22]  Pau Riba,et al.  Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition , 2020, Pattern Recognit..

[23]  Joan-Andreu Sánchez,et al.  An integrated grammar-based approach for mathematical expression recognition , 2016, Pattern Recognit..

[24]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[25]  Cuong Tuan Nguyen,et al.  Improvement of End-to-End Offline Handwritten Mathematical Expression Recognition by Weakly Supervised Learning , 2020, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[26]  Harold Mouchère,et al.  ICFHR 2014 Competition on Recognition of On-Line Handwritten Mathematical Expressions (CROHME 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Yongxin Yang,et al.  A Tree-Structured Decoder for Image-to-Markup Generation , 2020, ICML.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Harold Mouchère,et al.  ICFHR2016 CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Alexander M. Rush,et al.  What You Get Is What You See: A Visual Markup Decompiler , 2016, ArXiv.

[33]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[34]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[35]  George Labahn,et al.  A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[36]  Fei Yin,et al.  Image-to-Markup Generation via Paired Adversarial Learning , 2018, ECML/PKDD.

[37]  Daniel Kifer,et al.  Follow The Curve: Arbitrarily Oriented Scene Text Detection Using Key Points Spotting And Curve Prediction , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[38]  Zhi Tang,et al.  Automatic Generation of Headlines for Online Math Questions , 2019, AAAI.

[39]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[40]  Shiliang Zhang,et al.  Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition , 2017, Pattern Recognit..

[41]  Fei Yin,et al.  Handwritten Mathematical Expression Recognition via Paired Adversarial Learning , 2020, International Journal of Computer Vision.

[42]  Enhong Chen,et al.  Regularizing Neural Machine Translation by Target-bidirectional Agreement , 2018, AAAI.

[43]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[46]  Lemao Liu,et al.  Agreement on Target-bidirectional Neural Machine Translation , 2016, NAACL.