Script identification of handwritten word images

This paper describes a system for script identification of handwritten word images. The system is divided into two main phases, training and testing. The training phase performs a moment based feature extraction on the training word images and generates their corresponding feature vectors. The testing phase extracts moment features from a test word image and classifies it into one of the candidate script classes using information from the trained feature vectors. Experiments are reported on handwritten word images from three scripts: Latin, Devanagari and Arabic. Three different classifiers are evaluated over a dataset consisting of 12000 word images in training set and 7942word images in testing set. Results show significant strength in the approach with all the classifiers having a consistent accuracy of over 97%.

[1]  Roland T. Chin,et al.  On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Fu Chang,et al.  Language identification of character images using machine learning techniques , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Vivek Singhal,et al.  Script-based classification of hand-written text documents in a multilingual environment , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[4]  Patrick Kelly,et al.  Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  David S. Doermann,et al.  Identifying script on word-level with informational confidence , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[7]  Shijian Lu,et al.  Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  U. Pal,et al.  Neural network based word-wise handwritten script identification system for Indian postal automation , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[10]  Bidyut Baran Chaudhuri,et al.  Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[11]  Franz L. Alt,et al.  Digital Pattern Recognition by Moments , 1962, JACM.

[12]  Sally L. Wood,et al.  Language identification for printed text independent of segmentation , 1995, Proceedings., International Conference on Image Processing.