Databases and Competitions: Strategies to Improve Arabic Recognition Systems

The great success and high recognition rates of both OCR systems and recognition systems for handwritten words are unconceivable without the availability of huge datasets of real world data. This chapter gives a short survey of datasets used for recognition with special focus on their application. The main part of this chapter deals with Arabic handwriting, datasets for recognition systems, and their availability. A description of different datasets and their usability is given, and the results of a competition are presented. Finally, a strategy for the development of Arabic handwriting recognition systems based on datasets and competitions is presented.

[1]  R. Bippus,et al.  Cursive script recognition using semi continuous hidden Markov models in combination with simple features , 1994 .

[2]  Robert M. Haralick,et al.  An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Stephen V. Rice,et al.  Measuring the accuracy of page-reading systems , 1996 .

[4]  Hamid Amiri,et al.  Arabic Handwritten Words Recognition Based on a Planar Hidden Markov Model , 2005, Int. Arab J. Inf. Technol..

[5]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[6]  Stephen V. Rice,et al.  Software tools and test data for research and testing of page-reading OCR systems , 2005, IS&T/SPIE Electronic Imaging.

[7]  V. F. Maergner,et al.  On benchmarking of document analysis systems , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[8]  Tapas Kanungo,et al.  Performance evaluation of two Arabic OCR products , 1999, Other Conferences.

[9]  Ahmad Abdulkader,et al.  Two-Tier Approach for Arabic Offline Handwriting Recognition , 2006 .

[10]  Volker Märgner,et al.  SARAT-a system for the recognition of Arabic printed text , 1992, ICPR.

[11]  R. Ward,et al.  A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing , 1999, Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411).

[12]  Volker Märgner,et al.  A General Approach to Quality Evaluation of Document Segmentation Results , 1998, Document Analysis Systems.

[13]  Volker Märgner,et al.  Baseline estimation for Arabic handwritten words , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[14]  Volker Märgner,et al.  Synthetic data for Arabic OCR system development , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[15]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[17]  Volker Märgner,et al.  Data structures and tools for document database generation: an experimental system , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[18]  Ching Y. Suen,et al.  Databases for recognition of handwritten Arabic cheques , 2003, Pattern Recognit..

[19]  Noureddine Ellouze,et al.  ARABASE: A Relational Database for Arabic OCR Systems , 2005, Int. Arab J. Inf. Technol..

[20]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[21]  Chafic Mokbel,et al.  Arabic handwriting recognition using baseline dependant features and hidden Markov modeling , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[22]  Volker Märgner,et al.  HMM based approach for handwritten arabic word recognition using the IFN/ENIT - database , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[23]  Hua Wang,et al.  Printed Arabic document recognition system , 2005, IS&T/SPIE Electronic Imaging.