论文信息 - Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.

Alexander M. Rush | Yuntian Deng | Jeffrey Ling | A. Kanervisto

[1] Shiliang Zhang,et al. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition , 2017, Pattern Recognit..

[2] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[3] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[4] Alex Graves,et al. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[5] Simon Osindero,et al. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Marcin Andrychowicz,et al. Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[7] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[8] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[9] Matthew R. Walter,et al. What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[10] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[13] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[14] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[17] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[18] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[19] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[21] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23] Harold Mouchère,et al. ICFHR 2014 Competition on Recognition of On-Line Handwritten Mathematical Expressions (CROHME 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[24] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[25] Harold Mouchère,et al. ICDAR 2013 CROHME: Third International Competition on Recognition of Online Handwritten Mathematical Expressions , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27] Jin Hyung Kim,et al. ICFHR 2012 Competition on Recognition of On-Line Mathematical Expressions (CROHME 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[28] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[29] C. V. Jawahar,et al. Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[30] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[31] Jon M. Kleinberg,et al. Overview of the 2003 KDD Cup , 2003, SKDD.

[32] Masakazu Suzuki,et al. INFTY: an integrated OCR system for mathematical documents , 2003, DocEng '03.

[33] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34] Christopher Raphael,et al. Coarse-to-Fine Dynamic Programming , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[35] Dit-Yan Yeung,et al. Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[36] Barbara Di Eugenio,et al. Introduction to the Special Issue on Natural Language Generation , 1998, CL.

[37] Paul A. Viola,et al. Ambiguity and Constraint in Mathematical Expression Recognition , 1998, AAAI/IAAI.

[38] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[39] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40] Robert H. Anderson. Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[41] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[42] P. Nagabhushan,et al. Tracing and straightening the baseline in handwritten persian/arabic text-line: A new approach based on painting-technique , 2010 .

[43] Jean Paul Haton,et al. A Syntactic Approach for Handwritten Mathematical Formula Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.