A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian school entrance exam forms during the years 2004-2006 at 300 dpi. The only restriction imposed on the writers is to write each character within a rectangular box. The number of samples in each class of the database is non-uniform corresponding to their real life distributions. Also, for comparison purposes, each dataset has been properly divided into respective training and test sets.

[1]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[2]  Somaya Al-Máadeed,et al.  A data base for Arabic handwritten text recognition research , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[3]  K. Faez,et al.  Recognition of isolated handwritten Farsi/Arabic alphanumeric using fractal codes , 2004, 6th IEEE Southwest Symposium on Image Analysis and Interpretation, 2004..

[4]  Volker Märgner,et al.  HMM based approach for handwritten arabic word recognition using the IFN/ENIT - database , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  R. J. Green,et al.  Recognition of Handwritten Cursive Arabic Characters , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Karim Faez,et al.  Feature extraction with wavelet transform for recognition of isolated handwritten Farsi/Arabic characters and numerals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[8]  Karim Faez,et al.  Recognition of isolated handwritten Persian/Arabic characters and numerals using support vector machines , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[9]  M. Dehghan,et al.  Farsi handwritten character recognition with moment invariants , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[10]  Klaus U. Schulz,et al.  A corpus for comparative evaluation of OCR software and postcorrection techniques , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[11]  Mohammad Rahmati,et al.  Recognition of Persian handwritten digits using image profiles of multiple orientations , 2004, Pattern Recognit. Lett..

[12]  Saeed Mozaffari,et al.  Feature comparison between fractal codes and wavelet transform in handwritten alphanumeric recognition using SVM classifier , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Saeed Mozaffari,et al.  A Hybrid Structural/Statistical Classifier for Handwritten Farsi/Arabic Numeral Recognition , 2005, MVA.

[14]  Ching Y. Suen,et al.  Databases for recognition of handwritten Arabic cheques , 2003, Pattern Recognit..

[15]  Bidyut Baran Chaudhuri,et al.  Databases for research on recognition of handwritten characters of Indian scripts , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).