Automation of Indian Postal Documents Written in Bangla and English

In this paper, we present a system towards Indian postal automation based on pin-code and city name recognition. Here, at first, using Run Length Smoothing Approach (RLSA), non-text blocks (postal stamp, postal seal, etc.) are detected and using positional information, Destination Address Block (DAB) is identified from postal documents. Next, lines and words of the DAB are segmented. In India, the address part of a postal document may be written by a combination of two scripts: Latin (English) and a local (State/region) script. It is very difficult to identify the script by which pin-code part is written. To overcome this problem on pin-code part, we have used a two-stage artificial neural network based general scheme to recognize pin-code numbers written in any of the two scripts. To identify the script by which a word/city name is written, we propose a water reservoir concept based feature. For recognition of city names, we propose an NSHP-HMM (Non-Symmetric Half Plane-Hidden Markov Model) based technique. At present, the accuracy of the proposed digit numeral recognition module is 93.14% while that of city name recognition scheme is 86.44%.

[1]  Bidyut Baran Chaudhuri,et al.  A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation , 2004, ICVGIP.

[2]  Andrew F. Laine,et al.  Wavelet descriptors for multiresolution recognition of handprinted characters , 1995, Pattern Recognit..

[3]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[4]  Hiroshi Sako,et al.  Handwritten digit recognition: investigation of normalization and feature extraction techniques , 2004, Pattern Recognit..

[5]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[6]  Horst Bunke,et al.  Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Bidyut Baran Chaudhuri,et al.  A Hybrid Scheme for Handprinted Numeral Recognition Based on a Self-Organizing Network and MLP Classifiers , 2002, Int. J. Pattern Recognit. Artif. Intell..

[8]  Subhadip Basu,et al.  A Two-Pass Approach to Pattern Classification , 2004, ICONIP.

[9]  Robert Sabourin,et al.  An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[11]  Abdel Belaïd,et al.  Handwriting recognition using local methods for normalization and global methods for recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Venu Govindaraju,et al.  Skew detection for complex document images using fuzzy runlength , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Bidyut Baran Chaudhuri,et al.  A system towards Indian postal automation , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[14]  John W. Woods,et al.  On the relationship of the Markov mesh to the NSHP Markov chain , 1987, Pattern Recognit. Lett..

[15]  Yue Lu,et al.  Bangla/English Script Identification Based on Analysis of Connected Component Profiles , 2006, Document Analysis Systems.

[16]  Sargur N. Srihari,et al.  Document Image Binarization: Evaluation Of Algorithms , 1986, Optics & Photonics.

[17]  Tetsushi Wakabayashi,et al.  Increasing the feature size in handwritten numeral recognition to improve accuracy , 1995, Systems and Computers in Japan.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  George Saon,et al.  High Performance Unconstrained Word Recognition System Combining HMMs and Markov Random Fields , 1997, Int. J. Pattern Recognit. Artif. Intell..

[20]  Abdel Belaïd,et al.  Cross-learning in analytic word recognition without segmentation , 2002, International Journal on Document Analysis and Recognition.

[21]  Jinhai Cai,et al.  Integration of structural and statistical information for unconstrained handwritten numeral recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[22]  Pengfei Shi,et al.  Handwritten Bangla numeral recognition system and its application to postal automation , 2007, Pattern Recognit..

[23]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Santanu Chaudhury,et al.  Bengali alpha-numeric character recognition using curvature features , 1993, Pattern Recognit..

[25]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[27]  Donggang Yu,et al.  Analysis and recognition of broken handwritten digits based on morphological structure and skeleton , 2005, Int. J. Pattern Recognit. Artif. Intell..

[28]  Bidyut Baran Chaudhuri,et al.  Automatic Recognition of Unconstrained Off-Line Bangla Handwritten Numerals , 2000, ICMI.