Handwritten Tamil Character Recognition and Conversion using Neural Network

Hand written Tamil Character recognition refers to the process of conversion of handwritten Tamil character into Unicode Tamil character. The scanned image is segmented into paragraphs using spatial space detection technique, paragraphs into lines using vertical histogram, lines into words using horizontal histogram, and words into character image glyphs using horizontal histogram. The extracted features considered for recognition are given to Support Vector Machine, Self Organizing Map, RCS, Fuzzy Neural Network and Radial Basis Network. Where the characters are classified using supervised learning algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts. This character recognition finds applications in document analysis where the handwritten document can be converted to editable printed document. This approach can be extended to recognition and reproduction of hand written documents in South Indian languages. In the training set, a recognition rate of 100% was achieved and in the test set the recognized speed for each character is 0.1sec and accuracy is 97%. Understandably, the training set produced much higher recognition rate than the test set. Structure analysis suggested that the proposed system of RCS with back propagation network is given higher recognition rate. Handwritten character recognition is a difficult problem due to the great variations of writing styles, different size and orientation angle of the characters. Among different branches of handwritten character recognition it is easier to recognize English alphabets and numerals than Tamil characters. Many researchers have also applied the excellent generalization capabilities offered by ANNs to the recognition of characters. Many studies have used fourier descriptors and Back Propagation Networks for classification tasks. Fourier descriptors were used in to recognize handwritten numerals. Neural Network approaches were used to classify tools. There have been only a few attempts in the past to address the recognition of printed or handwritten Tamil Characters. However, less attention had been given to Indian language recognition. Some efforts have been reported in the literature

[1]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[2]  Vapnik,et al.  SVMs for Histogram Based Image Classification , 1999 .

[3]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[5]  Tomaso A. Poggio,et al.  Face recognition with support vector machines: global versus component-based approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Giovanni Soda,et al.  Artificial neural networks for document analysis and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[8]  Julie Delon,et al.  A Nonparametric Approach for Histogram Segmentation , 2007, IEEE Transactions on Image Processing.

[9]  Yinglin Yu,et al.  Handwritten Chinese character recognition using spatial Gabor filters and self-organizing feature maps , 1994, Proceedings of 1st International Conference on Image Processing.

[10]  Jitendra Malik,et al.  Spectral Partitioning with Indefinite Kernels Using the Nyström Extension , 2002, ECCV.

[11]  P. K. Simpson Fuzzy Min-Max Neural Networks-Part 1 : Classification , 1992 .