Convolutional Neural Networks for the Recognition of Malayalam Characters

Optical Character Recognition (OCR) has an important role in information retrieval which converts scanned documents into machine editable and searchable text formats. This work is focussing on the recognition part of OCR. LeNet-5, a Convolutional Neural Network (CNN) trained with gradient based learning and backpropagation algorithm is used for classification of Malayalam character images. Result obtained for multi-class classifier shows that CNN performance is dropping down when the number of classes exceeds range of 40. Accuracy is improved by grouping misclassified characters together. Without grouping, CNN is giving an average accuracy of 75% and after grouping the performance is improved upto 92%. Inner level classification is done using multi-class SVM which is giving an average accuracy in the range of 99-100%.

[1]  N. V. Neeba,et al.  Recognition of Malayalam Documents , 2009 .

[2]  K. P. Soman,et al.  A Support Vector Machines Approach for Efficient Facial Expression Recognition , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[3]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Sajilal Divakaran Spectral Analysis of Projection Histogram for Enhancing Close matching character Recognition in Malayalam , 2012, ArXiv.

[6]  Arjun Pradeep,et al.  Malayalam Character Recognition using Singular Value Decomposition , 2014 .

[7]  Li Zhuo,et al.  Extreme Weather Recognition Using Convolutional Neural Networks , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[8]  Bidyut B. Chaudhuri On OCR of a Printed Indian Script , 2007 .

[9]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[10]  Bidyut B. Chaudhuri Digital Document Processing , 2007 .

[11]  K. P. Soman,et al.  Machine Learning with SVM and other Kernel methods , 2009 .

[12]  C. V. Jawahar,et al.  Empirical Evaluation of Character Classification Schemes , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[13]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .