An Efficient Multi Lingual Optical Character Recognition System for Indian Languages Through Use of Bharati Script

Optical character recognition performs a critical part in interpreting videos and documents. Document specific issues like low image quality, distortions, composite background, noise etc. and language specific issues like cursive connectivity among the characters etc. makes OCR challenging and erroneous for Indian languages. The language specific challenges can be overcome by computing the script-based features and can achieve better accuracy. Computing the script based invariant features and patterns is computationally complex and error prone. In this background, we put forward Bharathi script (www.bharatiscript.com) based OCR system in which the inherent drawbacks of Indian scripts i.e. Hindi, Tamil, Telugu etc. are eliminated. The proposed OCR model has been tested on a synthetic dataset of documents of Bharathi script (in which Hindi scripts are converted to Bharathi script). Thorough experimental analysis with varied levels of noise confirms the promising results of character recognition accuracy of the proposed OCR model which out-performs the state-of-the-art OCR systems for Indian scripts. The proposed model achieves 76.70% with test documents consists of 50% noise and 99.98% with test documents of 0% noise.

[1]  Latesh G. Malik,et al.  Fine Classification & Recognition of Hand Written Devnagari Characters with Regular Expressions & Minimum Edit Distance Method , 2008, J. Comput..

[2]  Saad Bin Ahmed,et al.  Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Sabri A. Mahmoud,et al.  Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models , 2008, Signal Process..

[4]  Arghya Pal,et al.  Recognition of online handwritten Bangla characters using hierarchical system with Denoising Autoencoders , 2015, 2015 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC).

[5]  Paramvir Bahl,et al.  Recognition of handwritten word: first and second order hidden Markov model based approach , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Parul Sahare,et al.  Multilingual Character Segmentation and Recognition Schemes for Indian Document Images , 2018, IEEE Access.

[7]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[8]  Mahantapas Kundu,et al.  A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts , 2017, Pattern Recognit..

[9]  Nibaran Das,et al.  Deep learning for word-level handwritten Indic script identification , 2018, RTIP2R.

[10]  C. V. Jawahar,et al.  BLSTM Neural Network Based Word Retrieval for Hindi Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  G. G. Sarate,et al.  Handwritten Devnagari consonants recognition using MLPNN with five fold cross validation , 2013, 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT).

[12]  Prasenjit Dey,et al.  HMM-based Indic handwritten word recognition using zone segmentation , 2016, Pattern Recognit..

[13]  Mahantapas Kundu,et al.  Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach , 2017, Pattern Recognit. Lett..

[14]  Ujjwal Bhattacharya,et al.  Does Deeper Network Lead to Better Accuracy: A Case Study on Handwritten Devanagari Characters , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[15]  Neeta Nain,et al.  A Hybrid Feature Extraction Algorithm for Devanagari Script , 2015, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[16]  Mahantapas Kundu,et al.  Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition , 2008, 2008 IEEE Region 10 and the Third international Conference on Industrial and Information Systems.

[17]  Partha Pratim Roy,et al.  Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network , 2018, Pattern Recognit..

[18]  Navneet Goyal,et al.  Optical Character Recognition for Sanskrit Using Convolution Neural Networks , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[19]  Sriganesh Madhvanath,et al.  Machine recognition of online handwritten Devanagari characters , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Ved Prakash Agnihotri Offline Handwritten Devanagari Script Recognition , 2012 .

[21]  Arghya Pal Bengali handwritten numeric character recognition using denoising autoencoders , 2015, 2015 IEEE International Conference on Engineering and Technology (ICETECH).

[22]  Swapan K. Parui,et al.  Offline Handwritten Devanagari Word Recognition: An HMM Based Approach , 2007, PReMI.

[23]  C. V. Jawahar,et al.  Towards Accurate Handwritten Word Recognition for Hindi and Bangla , 2017, NCVPRIPG.

[24]  Tetsushi Wakabayashi,et al.  Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[25]  Santanu Chaudhury,et al.  Text recognition using deep BLSTM networks , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[26]  Fumitaka Kimura,et al.  Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier , 2006, ICVGIP.

[27]  Sushama Shelke,et al.  A Novel Multi-feature Multi-classifier Scheme for Unconstrained Handwritten Devanagari Character Recognition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[28]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[30]  Ganesh Ramakrishnan,et al.  Error Detection and Corrections in Indic OCR Using LSTMs , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31]  Basant Agarwal,et al.  Devanagri character recognition model using deep convolution neural network , 2018, Journal of Statistics and Management Systems.

[32]  Dan Ciresan,et al.  Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2013, 2015 International Joint Conference on Neural Networks (IJCNN).

[33]  Nibaran Das,et al.  Handwritten Indic Script Identification - A Multi-level Approach , 2018, CICBA.

[34]  Xiaoqing Ding,et al.  Discriminative Dimensionality Reduction for Multi-Dimensional Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  V. Srinivasa Chakravarthy,et al.  A comparative study of complexity of handwritten Bharati characters with that of major Indian scripts , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  Soumya K. Ghosh,et al.  Optical Character Recognition Systems for Different Languages with Soft Computing , 2016, Studies in Fuzziness and Soft Computing.

[37]  Gyanendra K. Verma,et al.  Handwritten Hindi Character Recognition Using Curvelet Transform , 2011, ICIS 2011.

[38]  Soumen Bag,et al.  Shape decomposition-based handwritten compound character recognition for Bangla OCR , 2018, J. Vis. Commun. Image Represent..

[39]  Brijesh Verma,et al.  Handwritten Hindi character recognition using multilayer perceptron and radial basis function neural networks , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[40]  C. V. Jawahar,et al.  Error Detection in Highly Inflectional Languages , 2013, 2013 12th International Conference on Document Analysis and Recognition.