Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models

Abstract In this paper, we present multi-font printed Arabic text recognition using hidden Markov models (HMMs). We propose a novel approach to the sliding window technique for feature extraction. The size and position of the cells of the sliding window adapt to the writing line of Arabic text and ink-pixel distributions. We employ a two-step approach for mixed-font text recognition, in which the input text line image is associated with the closest known font in the first step, using simple and effective features for font identification. The text line is subsequently recognized by the recognizer that was trained for the particular font in the next step. This approach proves to be more effective than text recognition using a recognizer trained on samples from multiple fonts. We also present a framework for the recognition of unseen fonts, which employs font association and HMM adaptation techniques. Experiments were conducted using two separate databases of printed Arabic text to demonstrate the effectiveness of the presented techniques. The presented techniques can be easily adapted to other scripts, such as Roman script.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Hermann Ney,et al.  RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts , 2012 .

[3]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Mohammad Alshayeb,et al.  A Database for Offline Arabic Handwritten Text Recognition , 2011, ICIAR.

[5]  Adel M. Alimi,et al.  A New Arabic Printed Text Image Database and Evaluation Protocols , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[7]  Sabri A. Mahmoud,et al.  Printed Arabic Text Recognition , 2012 .

[8]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[9]  Chafic Mokbel,et al.  Combining Slanted-Frame Classifiers for Improved HMM-Based Arabic Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Atheel Sabih Shaker Recognition of Off-line Printed Arabic Text Using Hidden Markov Models , 2018 .

[11]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition , 2014, Advances in Computer Vision and Pattern Recognition.

[12]  Mohammad Alshayeb,et al.  KHATT: An open Arabic offline handwritten text database , 2014, Pattern Recognit..

[13]  Horst Bunke,et al.  Handwritten sentence recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[15]  Haikal El Abed,et al.  Guide to OCR for Arabic Scripts , 2012, Springer London.

[16]  Craig A. Knoblock,et al.  Recognition of Multi-oriented, Multi-sized, and Curved Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Gernot A. Fink,et al.  Improvements in Sub-character HMM Model Based Arabic Text Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[19]  Erik Erlandson,et al.  Advances In Arabic Text Recognition , 2007 .

[20]  Rohit Prasad,et al.  Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[21]  Adel M. Alimi,et al.  New features for complex Arabic fonts in cascading recognition system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  S. Impedovo,et al.  Optical Character Recognition - a Survey , 1991, Int. J. Pattern Recognit. Artif. Intell..

[23]  J. Mantas,et al.  An overview of character recognition methodologies , 1986, Pattern Recognit..

[24]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[25]  Rohit Prasad,et al.  Improvements in hidden Markov model based Arabic OCR , 2008, 2008 19th International Conference on Pattern Recognition.

[26]  Husni Al-Muhtaseb,et al.  Recognition of off-line printed Arabic text using Hidden Markov Models , 2008, Signal Process..

[27]  Adel M. Alimi,et al.  ICDAR 2011 - Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[28]  Adel M. Alimi,et al.  ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[29]  M. S. Khorsheed,et al.  Developing discrete density Hidden Markov Models for Arabic printed text recognition , 2012, 2012 IEEE International Conference on Computational Intelligence and Cybernetics (CyberneticsCom).

[30]  Qi Tian,et al.  Survey: omnifont-printed character recognition , 1991, Other Conferences.

[31]  Christopher Raphael,et al.  Omnifont and unlimited-vocabulary OCR for English and Arabic , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[32]  Hamzah Luqman,et al.  KAFD Arabic font database , 2014, Pattern Recognit..

[33]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition: From Theory to Applications , 2007 .

[34]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[35]  Gernot A. Fink,et al.  Novel Sub-character HMM Models for Arabic Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[36]  Adel M. Alimi,et al.  Impact of Character Models Choice on Arabic Text Recognition Performance , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[37]  Nicolas Ragot,et al.  Combining Structure and Parameter Adaptation of HMMs for Printed Text Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Irfan Ahmad A Technique for Skew Detection of Printed Arabic Documents , 2013, 2013 10th International Conference Computer Graphics, Imaging and Visualization.

[39]  Mohammad S. Khorsheed,et al.  Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK) , 2007, Pattern Recognit. Lett..

[40]  Richard M. Schwartz,et al.  Multilingual Machine Printed OCR , 2001, Int. J. Pattern Recognit. Artif. Intell..

[41]  Alfons Juan-Císcar,et al.  Arabic Printed Word Recognition Using Windowed Bernoulli HMMs , 2013, ICIAP.

[42]  István Marosi Industrial OCR approaches: architecture, algorithms, and adaptation techniques , 2007, Electronic Imaging.