Borno: Bangla Handwritten Character Recognition Using a Multiclass Convolutional Neural Network

Handwriting recognition is still not a solved problem. With the advancements in artificial intelligence and machine learning, the construction of Optical Character Recognition systems (OCRs) has become more effective. However, there is still no serious commercially available OCRs for many low-resource languages, such as Bangla. Bangla presents additional challenges, since oftentimes, the vowels and consonants in the middle of the words are abbreviated and replaced with notations called diacritics, and multiple letters can be combined to build shorthand representations, called compound characters. Furthermore, the compound characters can have diacritics as well, making the recognition task extremely complex. This means that a successful commercial OCR should not only model individual characters but also model these diacritics and combined characters, leading us to propose a grapheme-based holistic recognition approach. Borno is the first multiclass convolutional neural network-based deep learning model that can recognize Bangla handwritten characters with graphemes. The proposed model has been trained on a dataset of 1,069,132 images, with 50 basic characters, 10 numerals, 146 compound characters, 10 modifiers, and 6 consonant diacritics classes. The trained Borno model achieves a 92.61% average character recognition accuracy in the validation set.

[1]  Syed Akhter Hossain,et al.  Ekush: A Multipurpose and Multitype Comprehensive Database for Online Off-Line Bangla Handwritten Characters , 2018, RTIP2R.

[2]  Sourajit Saha,et al.  A Lightning fast approach to classify Bangla Handwritten Characters and Numerals using newly structured Deep Neural Network , 2018 .

[3]  Chaehyeon Lee,et al.  Deep Learning-based Bengali Handwritten Grapheme Classification for Kaggle Bengali.AI Challenge , 2020 .

[4]  Subhadip Basu,et al.  CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Syed Akhter Hossain,et al.  MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters , 2020, Emerging Technologies in Data Mining and Information Security.

[6]  Mita Nasipuri,et al.  A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition , 2016, Pattern Recognit..

[7]  Subhadip Basu,et al.  Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier , 2010, ArXiv.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Md Shopon,et al.  Bangla handwritten digit recognition using autoencoder and deep convolutional neural network , 2016, 2016 International Workshop on Computational Intelligence (IWCI).

[10]  Sheikh Abujar,et al.  EkushNet: Using Convolutional Neural Network for Bangla Handwritten Recognition , 2018 .

[11]  Sheikh Abujar,et al.  BornoNet: Bangla Handwritten Characters Recognition Using Convolutional Neural Network , 2018 .

[12]  Rafiqul Islam,et al.  BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters , 2017, Data in brief.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.

[15]  M. M. Hafizur Rahman,et al.  Bangla Handwritten Character Recognition using Convolutional Neural Network , 2015 .

[16]  Hermann Ney,et al.  Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.

[17]  Riasat Azim,et al.  Bangla Hand Written Character Recognition Using Support Vector Machine , 2016 .

[18]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[19]  Ujjwal Bhattacharya,et al.  HMM Based Online Handwritten Bangla Character Recognition Using Dirichlet Distributions , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[20]  Rahat Hossain Faisal,et al.  Bangla Handwritten Basic Character Recognition Using Deep Convolutional Neural Network , 2019, 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR).

[21]  Nafees Mansoor,et al.  Classification of Bangla Compound Characters Using a HOG-CNN Hybrid Model , 2018 .

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Umapada Pal,et al.  Touching numeral segmentation using water reservoir concept , 2003, Pattern Recognit. Lett..