论文信息 - What You Get Is What You See: A Visual Markup Decompiler

What You Get Is What You See: A Visual Markup Decompiler

Building on recent advances in image caption generation and optical character recognition (OCR), we present a general-purpose, deep learning-based system to decompile an image into presentational markup. While this task is a well-studied problem in OCR, our method takes an inherently different, data-driven approach. Our model does not require any knowledge of the underlying markup language, and is simply trained end-to-end on real-world example data. The model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. To train and evaluate the model, we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup, as well as a synthetic dataset of web pages paired with HTML snippets. Experimental results show that the system is surprisingly effective at generating accurate markup for both datasets. While a standard domain-specific LaTeX OCR system achieves around 25% accuracy, our model reproduces the exact rendered image on 75% of examples.

Alexander M. Rush | Anssi Kanervisto | Yuntian Deng | Yuntian Deng | A. Kanervisto

[1] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[2] Simon Osindero,et al. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[4] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[5] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[7] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[10] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11] Jon M. Kleinberg,et al. Overview of the 2003 KDD Cup , 2003, SKDD.

[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[14] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[16] Jean Paul Haton,et al. A Syntactic Approach for Handwritten Mathematical Formula Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[18] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Zhi Tang,et al. Performance Evaluation of Mathematical Formula Identification , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[20] Harold Mouchère,et al. ICDAR 2013 CROHME: Third International Competition on Recognition of Online Handwritten Mathematical Expressions , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[21] Robert H. Anderson. Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[22] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23] Jason Weston,et al. Memory Networks , 2014, ICLR.

[24] Jin Hyung Kim,et al. ICFHR 2012 Competition on Recognition of On-Line Mathematical Expressions (CROHME 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[25] Dit-Yan Yeung,et al. Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[26] Paul A. Viola,et al. Ambiguity and Constraint in Mathematical Expression Recognition , 1998, AAAI/IAAI.

[27] Masakazu Suzuki,et al. INFTY: an integrated OCR system for mathematical documents , 2003, DocEng '03.