ICDAR 2013 Competition on Historical Book Recognition (HBR 2013)

This paper presents an objective comparative evaluation of layout analysis and recognition methods for scanned historical books. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2013 and the 2nd International Workshop on Historical Document Imaging and Processing (HIP2013), presenting the results of the evaluation of five methods - three submitted and two state-of-the-art systems (one commercial and one open-source). Three scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions, one evaluating segmentation and region classification (with a text extraction goal) and the other the whole pipeline including recognition. The results indicate that there is a convergence to a certain methodology, in terms of layout analysis, with some variations in the approach. However, there is still a considerable need to develop robust methods that deal with the idiosyncrasies of historical books, especially for OCR.

[1]  Apostolos Antonacopoulos,et al.  Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Kai Chen,et al.  Hybrid Page Segmentation with Efficient Whitespace Rectangles Extraction and Grouping , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Thomas M. Breuel,et al.  Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Apostolos Antonacopoulos,et al.  A Realistic Dataset for Performance Evaluation of Document Layout Analysis , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Thierry Géraud,et al.  The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Apostolos Antonacopoulos,et al.  Scenario Driven In-depth Performance Evaluation of Document Layout Analysis Methods , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Apostolos Antonacopoulos,et al.  ICDAR 2009 Page Segmentation Competition , 2003, 2009 10th International Conference on Document Analysis and Recognition.

[9]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  C. Clausner,et al.  Historical Document Layout Analysis Competition , 2011, 2011 International Conference on Document Analysis and Recognition.