Semi-automatic Ground Truth Generation for Chart Image Recognition

While research on scientific chart recognition is being carried out, there is no suitable standard that can be used to evaluate the overall performance of the chart recognition results. In this paper, a system for semi-automatic chart ground truth generation is introduced. Using the system, the user is able to extract multiple levels of ground truth data. The role of the user is to perform verification and correction and to input values where necessary. The system carries out automatic tasks such as text blocks detection and line detection etc. It can effectively reduce the time to generate ground truth data, comparing to full manual processing. We experimented the system using 115 images. The images and ground truth data generated are available to the public.

[1]  Karl Tombre,et al.  Graphics Recognition Algorithms and Systems , 1997, Lecture Notes in Computer Science.

[2]  Ching Y. Suen,et al.  Identification of Fork Points on the Skeletons of Handwritten Chinese Characters , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dov Dori,et al.  A protocol for performance evaluation of line detection algorithms , 1997, Machine Vision and Applications.

[4]  Ioannis A. Kakadiaris,et al.  Understanding diagrams in technical documents , 1992, Computer.

[5]  Dov Dori,et al.  Sparse Pixel Vectorization: An Algorithm and Its Performance Evaluation , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Chew Lim Tan,et al.  Learning-based scientific chart recognition , 2001 .

[7]  Chew Lim Tan,et al.  Hough technique for bar charts detection and recognition in document images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[8]  Sherif M. Yacoub,et al.  PerfectDoc: a ground truthing environment for complex documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[9]  David S. Doermann,et al.  Document image ground truth generation from electronic text , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  Dov Dori,et al.  Incremental Arc Segmentation Algorithm and Its Evaluation , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Toyohide Watanabe,et al.  Layout-Based Approach for Extracting Constructive Elements of Bar-Charts , 1997, GREC.

[12]  David Doermann,et al.  Document image ground truth generation from electronic text , 2004, ICPR 2004.

[13]  Yalin Wang,et al.  Automatic table ground truth generation and a background-analysis-based table structure extraction method , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Chew Lim Tan,et al.  Model-Based Chart Image Recognition , 2003, GREC.

[15]  Chew Lim Tan,et al.  A multi-level component grouping algorithm and its applications , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).