Syntactic Detection and Correction of Misrecognitions in Mathematical OCR

This paper proposes a syntactic method for detection and correction of misrecognized mathematical formulae for a practical mathematical OCR system. Linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define syntactically acceptable mathematical formulae.For the purpose of practical evaluation, a verification system is developed, and the effectiveness of the method is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.A satisfactory number of misrecognitions are detected and delivered to the correction process.

[1]  P. A. Chou,et al.  Recognition of Equations Using a Two-Dimensional Stochastic Context-Free Grammar , 1989, Other Conferences.

[2]  Akio Fujiyoshi,et al.  Spinal-Formed Context-Free Tree Grammars , 2000, Theory of Computing Systems.

[3]  Masakazu Suzuki,et al.  A ground-truthed mathematical character and symbol image database , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  Akio Fujiyoshi Application of the CKY Algorithm to Recognition of Tree Structures for Linear, Monadic Context-Free Tree Grammars , 2007, IEICE Trans. Inf. Syst..

[5]  Stephane Lavirotte Optical formula recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[6]  Dorothea Blostein,et al.  Mathematics recognition using graph rewriting , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Klaas Sikkel,et al.  Parsing of Context-Free Languages , 1997, Handbook of Formal Languages.

[8]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[9]  Volker Sorge,et al.  Towards a Parser for Mathematical Formula Recognition , 2006, MKM.

[10]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[11]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[12]  Melvin Klerer,et al.  Interactive Systems for Experimental Applied Mathematics , 1968 .