INFTY: an integrated OCR system for mathematical documents

An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysis of mathematical expressions, and manual error correction. In those procedures, several novel techniques are utilized for better recognition performance. Experimental results on about 500 pages of mathematical documents showed high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

[1]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Sonia Garcia-Salicetti,et al.  A hierarchical and recursive model of mathematical expressions for automatic reading of mathematical documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  Masakazu Suzuki,et al.  Detection and segmentation of touching characters in mathematical expressions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  Gerhard O. Michler,et al.  Report on the retrodigitization project “Archiv der Mathematik” , 2001 .

[5]  Masakazu Suzuki,et al.  Mathematical formula recognition using virtual link network , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[7]  M. Suzuki,et al.  Automatic reference linking in distributed digital libraries , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[9]  Kanahori Toshihiro,et al.  A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Area , 2001, GREC.

[10]  Mohamed Ben Ahmed,et al.  EXTRAFOR: automatic EXTRAction of mathematical FORmulas , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[11]  Hsi-Jian Lee,et al.  Design of a mathematical expression understanding system , 1997, Pattern Recognit. Lett..