ICFHR 2018 Competition on Recognition of Historical Arabic Scientific Manuscripts – RASM2018

This paper presents an objective comparative evaluation of page analysis and recognition methods for historical scientific manuscripts with text in Arabic language and script. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICFHR2018, presenting the results of the evaluation of six methods – three submitted and three baseline systems. The challenges for the participants included page segmentation, text line detection, and optical character recognition (OCR). Different evaluation metrics were used to gain an insight into the algorithms, including new character accuracy metrics to better reflect the difficult circumstances presented by the documents. The results indicate that, despite the challenging nature of the material, useful digitisation outputs can be produced.

[1]  Apostolos Antonacopoulos,et al.  Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[3]  Apostolos Antonacopoulos,et al.  ICDAR 2009 Page Segmentation Competition , 2003, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Thomas M. Breuel,et al.  Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Apostolos Antonacopoulos,et al.  The ENP image and ground truth dataset of historical newspapers , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Stephen V. Rice,et al.  Measuring the accuracy of page-reading systems , 1996 .

[8]  Apostolos Antonacopoulos,et al.  ICDAR 2013 Competition on Historical Newspaper Layout Analysis (HNLA 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[9]  Apostolos Antonacopoulos,et al.  The IMPACT dataset of historical document images , 2013, HIP '13.

[10]  Apostolos Antonacopoulos,et al.  ICDAR 2013 Competition on Historical Book Recognition (HBR 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[11]  Apostolos Antonacopoulos,et al.  Scenario Driven In-depth Performance Evaluation of Document Layout Analysis Methods , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jihad El-Sana,et al.  Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents , 2014, ICIAR.