Gujrati character recognition using weighted k-NN and Mean χ2 distance measure

With advances in the field of digitization, document analysis and handwriting recognition have emerged as key research areas. Authors present a handwritten character recognition system for Gujrati, an Indian language spoken by 40 million people. The proposed system extracts four features. A unique pattern descriptor and Gabor phase XNOR pattern are the two features that are newly proposed for isolated handwritten character set of Gujrati. In addition to these two features, we use contour direction probability distribution function and autocorrelation features. Next contribution is the weighted k-NN classifier. This research finally contributes is a novel mean χ2 distance measure. Proposed classifier exploits a combination of feature weights, new distance measure along with a triangular distance and Euclidian distance for performance that improves conventional k-NN classifier. The implementation on a comprehensive data set show 86.33 % recognition efficiency. Facts and figures show that proposed approach outperforms conventional k-NN. It is concluded that despite the shape ambiguities in Indian scripts, proposed classification algorithm could be a dominant technique in the field of handwritten character recognition.

[1]  M. B. Clowes,et al.  A New Technique in Automatic Character Recognition , 1961, Comput. J..

[2]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[3]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[5]  Sameer Antani,et al.  Gujarati character recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chandan Singh,et al.  A Complete OCR System for Gurmukhi Script , 2002, SSPR/SPR.

[8]  K. H. Aparna,et al.  A Complete OCR System Development of Tamil Magazine Documents , 2003 .

[9]  I. J. Taneja Bounds On Triangular Discrimination, Harmonic Mean and Symmetric Chi-square Divergences , 2005, math/0505238.

[10]  Atul Negi,et al.  Zone identification in the printed Gujarati text , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[11]  Premkumar Natarajan,et al.  The BBN Byblos Hindi OCR system , 2005, IS&T/SPIE Electronic Imaging.

[12]  Guillermo Sapiro,et al.  Is image steganography natural? , 2005, IEEE Transactions on Image Processing.

[13]  A Sharma Design and Implementation of Optical Character Recognition System to Recognize Gujarati Script using Template Matching , 2006 .

[14]  Archit Yajnik,et al.  Identification of gujarati characters using wavelets and neural networks , 2006, Artificial Intelligence and Soft Computing.

[15]  Ching Y. Suen,et al.  Character Recognition Systems: A Guide for Students and Practitioners , 2007 .

[16]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[17]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[18]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification on Offline Arabic Handwriting , 2007 .

[19]  Premkumar Natarajan,et al.  The BBN Byblos Hindi OCR System , 2009 .

[20]  Sriganesh Madhvanath,et al.  Online Handwriting Recognition for Indic Scripts , 2009 .

[21]  N. V. Neeba,et al.  Recognition of Malayalam Documents , 2009 .

[22]  Aparna Kokku,et al.  A Complete OCR System for Tamil Magazine Documents , 2009 .

[23]  Atalay Barkana,et al.  Speeding up the scaled conjugate gradient algorithm and its application in neuro-fuzzy classifier training , 2009, Soft Comput..

[24]  Jie Chen,et al.  Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition , 2010, IEEE Transactions on Image Processing.

[25]  Bayram Cetisli,et al.  The effect of linguistic hedges on feature selection: Part 2 , 2010, Expert Syst. Appl..

[26]  Bayram Cetisli,et al.  Development of an adaptive neuro-fuzzy classifier using linguistic hedges: Part 1 , 2010, Expert Syst. Appl..

[27]  Apurva A. Desai,et al.  Gujarati handwritten numeral optical character reorganization through neural network , 2010, Pattern Recognit..

[28]  Karbhari V. Kale,et al.  SUPPORT VECTOR MACHINE BASED GUJARATI NUMERAL RECOGNITION , 2011 .

[29]  Mamta Maloo,et al.  Gujarati Script Recognition: A Review , 2011 .

[30]  Binu P. Chacko,et al.  Handwritten character recognition using wavelet energy and extreme learning machine , 2012, Int. J. Mach. Learn. Cybern..

[31]  Jihah Nah,et al.  Digital Watermarking Robust to Geometric Distortions , 2012 .

[32]  Amit Dhurandhar,et al.  Probabilistic characterization of nearest neighbor classifier , 2012, International Journal of Machine Learning and Cybernetics.

[33]  Liangxiao Jiang,et al.  Bayesian Citation-KNN with distance weighting , 2014, Int. J. Mach. Learn. Cybern..

[34]  Terrance E. Boult,et al.  Good recognition is non-metric , 2013, Pattern Recognit..

[35]  Nenad Tomašev,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2014 .