On developing complete character set Meitei Mayek handwritten character database

This paper introduces a large-scale Meitei Mayek handwritten character database. It consists of the complete character set of the script. There are a total of 85,124 character images of 55 character classes with 72,330 and 12,794 images in training and test sets, respectively. The present work focuses on collecting the natural handwriting of individuals by carrying out sample collection in two phases: (a) unconstrained handwriting in the form of answer sheets and classroom notes and (b) tabular forms. A total of nearly 500 individuals have contributed in the development of the database. Recognition of the character images in the database is carried out using different feature descriptors with four popular classifiers, namely KNN, Linear Support Vector Classifier, Random Forest and Support Vector Machine. The paper also proposes a convolutional neural network (CNN) model by enhancing a base CNN architecture by optimally tuning the hyperparameters. Experimental results show that the CNN model can be benchmarked against the concerned database with a test accuracy of 95.56%.

[1]  Abdul Kawsar Tushar,et al.  Handwritten Arabic numeral recognition using deep learning neural networks , 2017, 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR).

[2]  Romesh Laishram,et al.  A neural network based handwritten Meitei Mayek alphabet optical character recognition system , 2014, 2014 IEEE International Conference on Computational Intelligence and Computing Research.

[3]  Ahmed Bouridane,et al.  HACDB: Handwritten Arabic characters database for automatic character recognition , 2013, European Workshop on Visual Information Processing (EUVIP).

[4]  Wahengbam Kanan Kumar,et al.  Handwritten Manipuri Meetei-Mayek Classification Using Convolutional Neural Network , 2019, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[5]  Khumanthem Manglem Singh,et al.  Manipuri Handwritten Character Recognition by Convolutional Neural Network , 2019, CVIP.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Sanjib Kumar Kalita,et al.  Recognition of Handwritten Numerals of Manipuri Script , 2013 .

[8]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[9]  Sarat Saharia,et al.  Convolutional Neural Network Based Meitei Mayek Handwritten Character Recognition , 2018, IHCI.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Prakash Choudhary,et al.  Recognition of Handwritten Meitei Mayek Script Based on Texture Feature , 2018 .

[12]  Umapada Pal,et al.  Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques , 2012, TALIP.

[13]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[14]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[15]  Somaya Al-Máadeed,et al.  A data base for Arabic handwritten text recognition research , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[16]  Robert J. Marks,et al.  Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks , 1999 .

[17]  Alireza Alaei,et al.  A Benchmark Kannada Handwritten Document Dataset and Its Segmentation , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Sanjib Kumar Kalita,et al.  Point Feature Based Recognition of Handwritten Meetei Mayek Script , 2018 .

[19]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Farhad Faradji,et al.  A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research , 2006 .

[21]  Keqin Li,et al.  A high-performance CNN method for offline handwritten Chinese character recognition and visualization , 2018, Soft Computing.

[22]  Rajendra Kumar Sharma,et al.  Benchmark Datasets for Offline Handwritten Gurmukhi Script Recognition , 2018, DAR@ICVGIP.

[23]  Sarat Saharia,et al.  Towards a Complete Character Set Meitei Mayek Handwritten Character Recognition , 2018, 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA).

[24]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[25]  Vijayan K. Asari,et al.  Handwritten Bangla Digit Recognition Using Deep Learning , 2017, ArXiv.

[26]  Wahengbam Kanan Kumar,et al.  Exploring an Efficient Handwritten Manipuri Meetei-Mayek Character Recognition Using Gradient Feature Extractor and Cosine Distance Based Multiclass k-Nearest Neighbor Classifier , 2017, ICON.

[27]  Sanjib Kumar Kalita,et al.  Recognition of Meetei Mayek characters using hybrid feature generated from distance profile and background directional distribution with Support Vector machine classifier , 2015, 2015 Communication, Control and Intelligent Systems (CCIS).

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Honggang Zhang,et al.  2009 10th International Conference on Document Analysis and Recognition HCL2000—A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition , 2022 .

[30]  Sanasam Inunganbi,et al.  Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition , 2020, The Visual Computer.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Robert Li,et al.  Advanced Image Classification Using Wavelets and Convolutional Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[33]  Hermano Perrelli de Moura,et al.  A Guide To Deal With Uncertainties In Software Project Management , 2014, ArXiv.

[34]  Aaron O'Leary,et al.  PyWavelets: A Python package for wavelet analysis , 2019, J. Open Source Softw..

[35]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[36]  Ching Y. Suen,et al.  Databases for recognition of handwritten Arabic cheques , 2003, Pattern Recognit..

[37]  Anupam Basu,et al.  Design and evaluation of Unicode compliance Meitei/Meetei Mayek keyboard layout , 2015, 2015 International Symposium on Advanced Computing and Communication (ISACC).

[38]  Vinay Uday Prabhu Kannada-MNIST: A new handwritten digits dataset for the Kannada language , 2019, ArXiv.

[39]  Seema Bawa,et al.  Recognition of Handwritten Character of Manipuri Script , 2010, J. Comput..

[40]  Mahantapas Kundu,et al.  A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts , 2017, Pattern Recognit..

[41]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[42]  Ching Y. Suen,et al.  A New Large Urdu Database for Off-Line Handwriting Recognition , 2009, ICIAP.

[43]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[44]  Daehwan Kim,et al.  Handwritten Korean character image database PE92 , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[45]  Mahantapas Kundu,et al.  Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach , 2017, Pattern Recognit. Lett..

[46]  Xiaohui Xie,et al.  Handwritten Hangul recognition using deep convolutional neural networks , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[47]  Th. Thokchom Singh Off-Line Handwritten Character Recognition of Manipuri Script , 2017 .

[48]  Sarat Saharia,et al.  Comparative Study of Different Classification Models on Benchmark Dataset of Handwritten Meitei Mayek Characters , 2020 .

[49]  Yoshua Bengio,et al.  Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark , 2016, Pattern Recognit..

[50]  Zhiyuan Li,et al.  Building efficient CNN architecture for offline handwritten Chinese character recognition , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[51]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[52]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[53]  K. P. Soman,et al.  On developing handwritten character image database for Malayalam language script , 2019, Engineering Science and Technology, an International Journal.