HMM-based Indic handwritten word recognition using zone segmentation

This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and then recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone, where characters are mostly touching, are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters in the middle zone. Next, the residue components, if any, in upper and lower zones are obtained in a character boundary then the components are combined with the character to achieve the final word level recognition. Water reservoir-based properties have been integrated in this framework to improve the zone segmentation and character boundary detection defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. PHOG features have been compared with other existing features and found robust for Indic script recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition. A novel approach of Indic handwritten word recognition using zone segmentation.Efficient PHOG features developed to improve the performance of HMM based middle zone recognition.Integration of water reservoir concept for better character alignment in a word image.A detailed study of experimental results in Bangla and Devanagari scripts has been performed.The proposed framework outperforms traditional without-zone-segmentation based recognition systems.

[1]  Jin Chen,et al.  Gabor features for offline Arabic handwriting recognition , 2010, DAS '10.

[2]  Tonghua Su,et al.  Chinese Handwriting Recognition: An Algorithmic Perspective , 2013, Springer Briefs in Electrical and Computer Engineering.

[3]  Chafic Mokbel,et al.  Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[6]  Nikos Fakotakis,et al.  Slant estimation algorithm for OCR systems , 2001, Pattern Recognit..

[7]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Fumitaka Kimura,et al.  Multi-lingual City Name Recognition for Indian Postal Automation , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[9]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Umapada Pal,et al.  Multi-oriented Bangla and Devnagari text recognition , 2010, Pattern Recognit..

[11]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[15]  Utpal Roy,et al.  Lexicon Reduction Technique for Bangla Handwritten Word Recognition , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[16]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[17]  Prasenjit Dey,et al.  A Novel Approach of Bangla Handwritten Text Recognition Using HMM , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2009 .

[20]  Lianwen Jin,et al.  A novel feature extraction method using Pyramid Histogram of Orientation Gradients for smile recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[21]  Umapada Pal,et al.  Morphology Based Handwritten Line Segmentation Using Foreground and Background Information , 2008 .

[22]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[23]  S. M. Murtoza Habib,et al.  A High Performance Domain Specific Ocr For Bangla Script , 2008 .

[24]  Ujjwal Bhattacharya,et al.  Combination of Features for Efficient Recognition of Offline Handwritten Devanagari Words , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[25]  Bidyut Baran Chaudhuri,et al.  Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Umapada Pal,et al.  A comparative study of features for handwritten Bangla text recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[27]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[28]  Rohit Prasad,et al.  Handwritten and Typewritten Text Identification and Recognition Using Hidden Markov Models , 2011, 2011 International Conference on Document Analysis and Recognition.

[29]  Jian Zhou,et al.  Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Bidyut Baran Chaudhuri,et al.  A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation , 2004, ICVGIP.

[31]  Robert Sabourin,et al.  Recognition and verification of unconstrained handwritten words , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[33]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..