Script Identification of Pre-segmented Multi-font Characters and Digits

Character recognition problems of distinct scripts have their own script specific characteristics. The state-of-art optical character recognition systems use different methodolgies, to recognize different script characters, which are most effective for the corresponding script. The identificaton of the script of the individual character has not brought much attention between researchers, most of the script identification work is on document, line and word level. In this multilingual/multiscript world presence of different script characters in a single document is very common. We here propose a system to encounter such adverse situation in context of English and Gurumukhi Script. Experiments on multifont and multisized characters with Gabor features based on directional frequency and Gradient features based on gradient information of an individual character to identify it as Gurumukhi or English and also as character or numeral are reported here. Treating it as four class classification problem, multi-class Support Vector Machine(One Vs One) has been used for classification. We got promising results with both types of features. The average identification rates obtained with Gabor and Gradient features are 98.9% and 99.45% respectively.

[1]  Bidyut Baran Chaudhuri,et al.  Identification of different script lines from multi-script documents , 2002, Image Vis. Comput..

[2]  Bidyut Baran Chaudhuri,et al.  Script line separation from Indian multi-script documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  Basanna V. Dhandra,et al.  On Separation of English Numerals from Multilingual Document Images , 2007, J. Multim..

[4]  Renu Dhir,et al.  Comparative Analysis of Gabor and Discriminating Feature Extraction Techniques for Script Identification , 2011, ICIS 2011.

[5]  U. Pal,et al.  Multi-script line identification from Indian documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Bidyut Baran Chaudhuri,et al.  Word-Wise Script Identification from Indian Documents , 2004, Document Analysis Systems.

[7]  Debashis Ghosh,et al.  Script Recognition—A Review , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Changsong Liu,et al.  Single-character type identification , 2001, IS&T/SPIE Electronic Imaging.

[9]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[10]  Bidyut B. Chaudhuri,et al.  Script Line Separation from Indian Multi-Script Documents , 2003 .

[11]  Yuanping Zhu,et al.  Separate Chinese Character and English Character by Cascade Classifier and Feature Selection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Santanu Chaudhury,et al.  Trainable script identification strategies for Indian languages , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Somchai Jitapunkul,et al.  Language-based hand-printed character recognition: a novel method using spatial and temporal informative features , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[14]  A. G. Ramakrishnan,et al.  Word level multi-script identification , 2008, Pattern Recognit. Lett..