Performance Evaluation of Mathematical Formula Identification

This paper presents a performance evaluation system for mathematical formula identification. First, a ground-truth dataset is constructed to facilitate the performance comparison of different mathematical formula identification algorithms. Statistics analysis of the dataset shows the diversities of the dataset to reflect the real-world documents. Second, a performance evaluation metric for mathematical formula identification is proposed, including the error type definitions and the scenario-adjustable scoring. The proposed metric enables in-depth analysis of mathematical formula identification systems in different scenarios. Finally, based on the proposed evaluation metric, a tool is developed to automatically evaluate mathematical formula identification results. It is worth noting that the ground-truth dataset and the evaluation tool are freely available for academic purpose.

[1]  Utpal Garain,et al.  Identification of Mathematical Expressions in Document Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Henry S. Baird,et al.  Distinguishing mathematics notation from English text using computational geometry , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Bidyut Baran Chaudhuri,et al.  A corpus for OCR research on mathematical expressions , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Masakazu Suzuki,et al.  INFTY: an integrated OCR system for mathematical documents , 2003, DocEng '03.

[5]  Liangcai Gao,et al.  Mathematical Formula Identification in PDF Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Masakazu Suzuki,et al.  A ground-truthed mathematical character and symbol image database , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[7]  Bidyut Baran Chaudhuri,et al.  Identification of embedded mathematical expressions in scanned documents , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Mohamed Ben Ahmed,et al.  Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context , 2001, International Journal on Document Analysis and Recognition.

[9]  Volker Sorge,et al.  Towards Reverse Engineering of PDF Documents , 2011 .

[10]  Masakazu Suzuki,et al.  Comparing Approaches to Mathematical Document Analysis from PDF , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Dorothea Blostein,et al.  Issues in Performance Evaluation: A Case Study of Math Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Masayuki Okamoto,et al.  Performance evaluation of a mathematical formula recognition system with a large scale of printed formula images , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).