Automatic recognition of handwritten Arabic using maximally stable extremal region features

Abstract. Arabic script is inherently cursive in nature; therefore, optical character recognition (OCR) techniques developed for other scripts are generally inappropriate for Arabic. We present an OCR system for the recognition of handwritten Arabic words. Several original contributions have been made. First, we have proposed a novel feature vector to represent handwritten Arabic words based on the ellipsoid approximation of maximally stable extremal regions (MSERs). This feature vector is compact, robust, and well suited to handle writer-induced variation in handwritten script. Second, we present a new database, Saudi Arabian city name (SACN) database consisting of handwritten Saudi Arabian city names, which is the first such publicly available database. Finally, we present promising experimental results on two databases. One of the main experiments compares the proposed MSER feature vector with several well-established feature extraction methods using the proposed SACN database. The proposed feature vector performed better than all the standardized methods, with 100% correct identification rate. Another experiment is designed to compare our system to other recent studies on handwritten Arabic text recognition using the extensively employed IFN/ENIT database. Using the standard experimental protocol, our system outperformed all others with 92.64% correct identification rate.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Valen E. Johnson,et al.  Image Restoration and Reconstruction , 2006 .

[4]  Sabri A. Mahmoud,et al.  Recognition : A Survey , 2013 .

[5]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[6]  Jinchang Ren,et al.  Performance of hidden Markov model and dynamic Bayesian network classifiers on handwritten Arabic word recognition , 2011, Knowl. Based Syst..

[7]  Sabri A. Mahmoud,et al.  Arabic Online Handwriting Recognition (AOHR) , 2017, ACM Comput. Surv..

[8]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[9]  Jae S. Lim,et al.  Two-Dimensional Signal and Image Processing , 1989 .

[10]  David Nistér,et al.  Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[11]  Luis Miguel Bergasa,et al.  Text location in complex images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[12]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[13]  Sabri A. Mahmoud,et al.  Arabic handwriting recognition using structural and syntactic pattern attributes , 2013, Pattern Recognit..

[14]  Jianmin Jiang,et al.  Multi-class class classification of unconstrained handwritten Arabic words using machine learning approaches , 2009 .

[15]  Alireza Alaei,et al.  Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[16]  Jamshid Shanbehzadeh,et al.  Persian/arabic handwritten word recognition using M-band packet wavelet transform , 2008, Image Vis. Comput..

[17]  Dan Roth,et al.  Learning in Natural Language , 1999, IJCAI.

[18]  Hassiba Nemmour,et al.  Handwritten Arabic word recognition based on Ridgelet transform and support vector machines , 2011, 2011 International Conference on High Performance Computing & Simulation.

[19]  Adel M. Alimi,et al.  2009 10th International Conference on Document Analysis and Recognition Combining Multiple HMMs Using On-line and Off-line Features for Off-line Arabic Handwriting Recognition , 2022 .

[20]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[21]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ezzat El-Sherif,et al.  Arabic handwritten digit recognition , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[23]  Saeed Mozaffari,et al.  Structural decomposition and statistical description of Farsi/Arabic handwritten numeric characters , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[24]  Fardin Abdali-Mohammadi,et al.  Evolutionary Metric-Learning-Based Recognition Algorithm for Online Isolated Persian/Arabic Characters, Reconstructed Using Inertial Pen Signals , 2017, IEEE Transactions on Cybernetics.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[27]  Chikhi Salim,et al.  Combining neural networks for Arabic handwriting recognition , 2011, 2011 10th International Symposium on Programming and Systems.

[28]  Serge J. Belongie,et al.  Matching with shape contexts , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[29]  Gernot A. Fink,et al.  Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[30]  Mohamad Shanudin Zakaria,et al.  Isolation of Dots for Arabic OCR using Voronoi Diagrams , 2007 .

[31]  Adel M. Alimi,et al.  A New Approach for Arabic Handwritten Postal Addresses Recognition , 2012, ArXiv.

[32]  David S. Doermann,et al.  Document Image Quality Assessment: A Brief Survey , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[33]  Jinchang Ren,et al.  Multiclass Classification of Unconstrained Handwritten Arabic Words Using Machine Learning Approaches , 2009 .

[34]  Mohamed Cheriet,et al.  Feature Set Evaluation for Offline Handwriting Recognition Systems: Application to the Recurrent Neural Network Model , 2016, IEEE Transactions on Cybernetics.

[35]  Maâmar Kef,et al.  SIFT descriptors for Arabic handwriting recognition , 2015, Int. J. Comput. Vis. Robotics.

[36]  Zicheng Guo,et al.  Parallel thinning with two-subiteration algorithms , 1989, Commun. ACM.

[37]  Johan A. K. Suykens,et al.  Fast Prediction with SVM Models Containing RBF Kernels , 2014, ArXiv.