Structural Analysis of Mathematical Formulae with Verification Based on Formula Description Grammar

In this paper, a reliable and efficient structural analysis method for mathematical formulae is proposed for practical mathematical OCR. The proposed method consists of three steps. In the first step, a fast structural analysis algorithm is performed on each mathematical formula to obtain a tree representation of the formula. This step generally provides a correct tree representation but sometimes provides an erroneous representation. Therefore, the tree representation is verified by the following two steps. In the second step, the result of the analysis step, (i.e., a tree representation) is converted into a one-dimensional representation. The third step is a verification step where the one-dimensional representation is parsed by a formula description grammar, which is a context-free grammar specialized for mathematical formulae. If the one-dimensional representation is not accepted by the grammar, the result of the analysis step is detected as an erroneous result and alarmed to OCR users. This three-step organization achieves reliable and efficient structural analysis without any two-dimensional grammars.

[1]  Masakazu Suzuki,et al.  INFTY: an integrated OCR system for mathematical documents , 2003, DocEng '03.

[2]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[3]  P. A. Chou,et al.  Recognition of Equations Using a Two-Dimensional Stochastic Context-Free Grammar , 1989, Other Conferences.

[4]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[5]  Sonia Garcia-Salicetti,et al.  A hierarchical and recursive model of mathematical expressions for automatic reading of mathematical documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Masakazu Suzuki,et al.  Mathematical formula recognition using virtual link network , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Richard J. Fateman,et al.  Optical Character Recognition and Parsing of Typeset Mathematics1 , 1996, J. Vis. Commun. Image Represent..

[8]  Masakazu Suzuki,et al.  A ground-truthed mathematical character and symbol image database , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[9]  B. B. Chaudhuri,et al.  A syntactic approach for processing mathematical expressions in printed documents , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[11]  Richard Zanibbi,et al.  Recognizing Mathematical Expressions Using Tree Transformation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Melvin Klerer,et al.  Interactive Systems for Experimental Applied Mathematics , 1968 .