Statistical geometric components of straight lines (SGCSL) feature extraction method for offline Arabic/Persian handwritten words recognition

In this study, the authors present a new feature extraction method for handwritten Arabic/Persian language word recognition. This feature is based on the angle, number, location, and size of straight lines which represents geometric and quantitative attributes of a word. At first, word image is broken into an m × n window and straight lines are extracted from each window. Then, the proposed features are taken from these lines and combined together. Finally, the features of the images are used for training and testing support vector machine classifier. The proposed method is tested on three datasets: IBN-SINA and IFN/ENIT for Arabic words and Iran-cities for Persian words recognition. Recognition accuracy of the proposed method is about 67.47, 86.22 and 80.78% for the Iran-cities, IBN-SINA and IFN/ENIT Arabic dataset, respectively, which is better than state-of-the-art methods.

[1]  Mohammad Reza Keyvanpour,et al.  Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms , 2013, Intell. Data Anal..

[2]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[3]  Sabri A. Mahmoud,et al.  Recognition : A Survey , 2013 .

[4]  Alfons Juan-Císcar,et al.  Handwriting word recognition using windowed Bernoulli HMMs , 2014, Pattern Recognit. Lett..

[5]  Abdelmajid Ben Hamadou,et al.  Off-line handwritten word recognition using multi-stream hidden Markov models , 2010, Pattern Recognit. Lett..

[6]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chafic Mokbel,et al.  Handwritten word recognition using Web resources and recurrent neural networks , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Feng Tian,et al.  Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Richard Gran,et al.  On the Convergence of Random Search Algorithms In Continuous Time with Applications to Adaptive Control , 1970, IEEE Trans. Syst. Man Cybern..

[10]  Ashraf Elnagar,et al.  Recognition of handwritten Hindu numerals using structural descriptors , 2003, J. Exp. Theor. Artif. Intell..

[11]  Karim Faez,et al.  Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM , 2001, Pattern Recognit..

[12]  Shujing Lu,et al.  Recognition of handwritten Chinese address with writing variations , 2016, Pattern Recognit. Lett..

[13]  Sabri A. Mahmoud,et al.  Arabic handwriting recognition using structural and syntactic pattern attributes , 2013, Pattern Recognit..

[14]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2009 .

[15]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Najoua Essoukri Ben Amara,et al.  Arabic handwritten word recognition based on dynamic bayesian network , 2016, Int. Arab J. Inf. Technol..

[17]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[18]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[20]  Ching Y. Suen,et al.  Learning-based word spotting system for Arabic handwritten documents , 2014, Pattern Recognit..

[21]  Sameh M. Awaidah,et al.  A multiple feature/resolution scheme to Arabic (Indian) numerals recognition using hidden Markov models , 2009, Signal Process..