A survey on optical character recognition for Bangla and Devanagari scripts

The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc.  In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

[1]  Bidyut Baran Chaudhuri,et al.  Automatic recognition of printed Oriya script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Tetsushi Wakabayashi,et al.  Handwritten Bangla Compound Character Recognition Using Gradient Feature , 2007 .

[3]  Partha Bhowmick,et al.  Detection of Structural Concavities in Character Images - A Writer-Independent Approach , 2012, PerMIn.

[4]  Christodoulos Chamzas,et al.  Web Document Image Retrieval System Based on Word Spotting , 2006, 2006 International Conference on Image Processing.

[5]  Fumitaka Kimura,et al.  OCR Technologies for Machine Printed and Hand Printed Japanese Text , 2007 .

[6]  Farzin Mokhtarian,et al.  Cursive handwriting recognition using hidden Markov models and a lexicon-driven level building algorithm , 2000 .

[7]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Sargur N. Srihari,et al.  Offline Chinese handwriting recognition: an assessment of current technology , 2007, Frontiers of Computer Science in China.

[9]  Chorkin Chan,et al.  Off-Line Handwritten Chinese Character Recognition as a Compound Bayes Decision Problem , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Chandan Singh,et al.  A post-processor for Gurmukhi OCR , 2002 .

[11]  Subhadip Basu,et al.  A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[12]  M AbdulRahiman Printed Malayalam Character Recognition Using Back-propagation Neural Networks , 2009 .

[13]  Hiromichi Fujisawa,et al.  Forty years of research in character and document recognition - an industrial perspective , 2008, Pattern Recognit..

[14]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[15]  Utpal Roy,et al.  A Novel Approach to Skew Detection and Character Segmentation for Handwritten Bangla Words , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[16]  R. S. Kunte,et al.  A Bilingual Machine-Interface OCR for Printed Kannada and English Text Employing Wavelet Features , 2007 .

[17]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[18]  Anandarup Roy,et al.  SVM-based hierarchical architectures for handwritten Bangla character recognition , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[19]  Gabriella Kazai,et al.  Setting up a competition framework for the evaluation of structure extraction from OCR-ed books , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[20]  Subhadip Basu,et al.  Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier , 2010, ArXiv.

[21]  G. Hemantha Kumar,et al.  Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis , 2008, Eng. Appl. Artif. Intell..

[22]  Herbert Freeman,et al.  Computer Processing of Line-Drawing Images , 1974, CSUR.

[23]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[24]  R. Jagadeesh Kannan,et al.  A Comparative Study of Optical Character Recognition for Tamil Script , 2005 .

[25]  Sanghamitra Mohanty,et al.  An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[26]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[27]  Utpal Garain,et al.  A Weighted Finite-State Transducer (WFST)-Based Language Model for Online Indic Script Handwriting Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[28]  Gang Xu,et al.  A Linear Algorithm for Motion From Three Weak Perspective Images Using Euler Angles , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Jalal Mahmud,et al.  A complete OCR system for continuous Bengali characters , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[30]  Hang Joon Kim,et al.  Recognition of off-line handwritten Korean characters , 1996, Pattern Recognit..

[31]  Partha Bhowmick,et al.  Recognition of Bengali Handwritten Characters Using Skeletal Convexity and Dynamic Programming , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[32]  Prateek Sarkar,et al.  Document image analysis for digital libraries , 2006, IWRIDL '06.

[33]  Xiaoyan Zhu,et al.  An OCR Post-processing Approach Based on Multi-knowledge , 2005, KES.

[34]  Abdulmotaleb El-Saddik,et al.  Notice of Violation of IEEE Publication PrinciplesModified Syntactic Method to Recognize Bengali Handwritten Characters , 2007, IEEE Transactions on Instrumentation and Measurement.

[35]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[36]  Tetsushi Wakabayashi,et al.  Off-Line Handwritten Character Recognition of Devnagari Script , 2007 .

[37]  Mandar Mitra,et al.  Automatic recognition of printed Oriya script , 2002 .

[38]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[39]  M. Lings,et al.  Articles , 1967, Soil Science Society of America Journal.

[40]  Fernando José Artigas-Fuentes,et al.  A High-Dimensional Access Method for Approximated Similarity Search in Text Mining , 2010, 2010 20th International Conference on Pattern Recognition.

[41]  Brijesh Verma,et al.  Handwritten Hindi character recognition using multilayer perceptron and radial basis function neural networks , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[42]  Anvar Bahrampour,et al.  Implementation of Three Text to Speech Systems for Kurdish Language , 2009, CIARP.

[43]  Chunheng Wang,et al.  A Chinese OCR spelling check approach based on statistical language models , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[44]  Tetsushi Wakabayashi,et al.  Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[45]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[46]  Adnan Amin,et al.  Off line Arabic character recognition: a survey , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[47]  Partha Bhowmick,et al.  Topological features for recognizing printed and handwritten Bangla characters , 2011, MOCR_AND '11.

[48]  Veena Bansal,et al.  A complete OCR for printed Hindi text in Devanagari script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[49]  R. Mahesh K. Sinha,et al.  Rule based contextual post-processing for devanagari text recognition , 1987, Pattern Recognit..

[50]  Mahantapas Kundu,et al.  Handwritten Bangla Digit Recognition Using Classifier Combination Through DS Technique , 2005, PReMI.

[51]  Atul Negi,et al.  Towards Improving the Accuracy of Telugu OCR Systems , 2011, 2011 International Conference on Document Analysis and Recognition.

[52]  Subhadip Basu,et al.  A Two-Pass Approach to Pattern Classification , 2004, ICONIP.

[53]  George Nagy,et al.  Prototype Extraction and Adaptive OCR , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Keung-Chi Ng,et al.  Uncertainty management in expert systems , 1990, IEEE Expert.

[55]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[56]  R. D. Sudhaker Samuel,et al.  A Novel Bilingual OCR for Printed Malayalam-English Text Based on Gabor Features and Dominant Singular Values , 2009, 2009 International Conference on Digital Image Processing.

[57]  Soumen Bag,et al.  A Novel Topographic Feature Extraction Method for Indian Character Images , 2011 .

[58]  Mohammad Kaykobad,et al.  A Complete Bengali OCR: A Novel Hybrid Approach to Handwritten Bengali Character Recognition , 1998 .

[59]  Subhadip Basu,et al.  A hierarchical approach to recognition of handwritten Bangla characters , 2009, Pattern Recognit..

[60]  Santanu Chaudhury,et al.  Devnagari numeral recognition by combining decision of multiple connectionist classifiers , 2002 .

[61]  Malayappan Shridhar,et al.  On Recognition of Handwritten Bangla Characters , 2006, ICVGIP.

[62]  C. V. Jawahar,et al.  Efficient Search in Document Image Collections , 2007, ACCV.

[63]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[64]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[65]  P. S. Sastry,et al.  A font and size-independent OCR system for printed Kannada documents using support vector machines , 2002 .

[66]  Angshul Majumdar,et al.  Bangla Basic Character Recognition Using Digital Curvelet Transform , 2007 .

[67]  Bidyut Baran Chaudhuri,et al.  A Hybrid Scheme for Handprinted Numeral Recognition Based on a Self-Organizing Network and MLP Classifiers , 2002, Int. J. Pattern Recognit. Artif. Intell..

[68]  Chandan Singh,et al.  A Gurmukhi script recognition system , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[69]  Bidyut Baran Chaudhuri,et al.  Recognition of Handprinted Bangla Numerals Using Neural Network Models , 2002, AFSS.

[70]  Fumitaka Kimura,et al.  Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier , 2006, ICVGIP.

[71]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  C. V. Jawahar,et al.  A post-processing scheme for malayalam using statistical sub-character language models , 2010, DAS '10.

[73]  Gurpreet Singh Lehal,et al.  A Recognition System for Devnagri and English Handwritten Numerals , 2000, ICMI.

[74]  Vaibhav Sharma,et al.  A New Termination Detection Protocol for Mobile Distributed Systems , 2007 .

[75]  Venu Govindaraju,et al.  Design and comparison of segmentation driven and recognition driven Devanagari OCR , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[76]  Dai Ruwei,et al.  Chinese character recognition: history, status and prospects , 2007 .

[77]  J. Mantas,et al.  An overview of character recognition methodologies , 1986, Pattern Recognit..

[78]  Madasu Hanmandlu,et al.  Fuzzy Model Based Recognition of Handwritten Hindi Numerals using Bacterial Foraging , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[79]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[80]  B. B. Chaudhuri,et al.  A MLP Classifier for Both Printed and Handwritten Bangla Numeral Recognition , 2006, ICVGIP.

[81]  Jin Hyung Kim,et al.  Recognition of on-line cursive Korean characters combining statistical and structural methods , 1997, Pattern Recognit..

[82]  M.S. Rahman,et al.  Segmentation of printed bangla characters using structural properties of Bangla script , 2008, 2008 International Conference on Electrical and Computer Engineering.

[83]  Daniel P. Lopresti,et al.  Handwriting recognition research: Twenty years of achievement... and beyond , 2009, Pattern Recognit..

[84]  Ching Y. Suen,et al.  A class-modular feedforward neural network for handwriting recognition , 2002, Pattern Recognit..

[85]  Ishwar K. Sethi,et al.  Machine recognition of constrained hand printed devanagari , 1977, Pattern Recognit..

[86]  Michael E. Jahr,et al.  Translation-Inspired OCR , 2011, 2011 International Conference on Document Analysis and Recognition.

[87]  Maitreyee Dutta,et al.  Neural network based handwritten hindi character recognition system , 2009, COMPUTE '09.

[88]  Amit Dhurandhar,et al.  Robust Pattern Recognition Scheme for Devanagari Script , 2005, CIS.

[89]  David A. Smith,et al.  Learning on the fly: a font-free approach toward multilingual OCR , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[90]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[91]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[92]  Bidyut Baran Chaudhuri,et al.  Segmentation of Bangla handwritten text into characters by recursive contour following , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[93]  Fuad Rahman,et al.  Recognition of handwritten Bengali characters: a novel multistage approach , 2002, Pattern Recognit..

[94]  Bidyut Baran Chaudhuri,et al.  Automatic Recognition of Unconstrained Off-Line Bangla Handwritten Numerals , 2000, ICMI.

[95]  David S. Doermann,et al.  Adaptive Hindi OCR using generalized Hausdorff image comparison , 2003, TALIP.

[96]  Venu Govindaraju,et al.  Challenges in OCR of Devanagari documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[97]  Nicholas P. Carter,et al.  Automatic Recognition of Printed Music , 1992 .

[98]  Abdulmotaleb El Saddik,et al.  Modified Syntactic Method to Recognize Bengali Handwritten Characters , 2007 .

[99]  Bidyut Baran Chaudhuri,et al.  Compound character recognition by run-number-based metric distance , 1998, Electronic Imaging.

[100]  Venu Govindaraju,et al.  Devanagari OCR using a recognition driven segmentation framework and stochastic language models , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[101]  Veena Bansal,et al.  Integrating knowledge sources in Devanagari text recognition system , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[102]  Fumitaka Kimura,et al.  A Lexicon-Driven Handwritten City-Name Recognition Scheme for Indian Postal Automation , 2009, IEICE Trans. Inf. Syst..

[103]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[104]  Shamik Sural,et al.  An MLP using Hough transform based fuzzy feature extraction for Bengali script recognition , 1999, Pattern Recognit. Lett..

[105]  Santanu Chaudhury,et al.  Bengali alpha-numeric character recognition using curvature features , 1993, Pattern Recognit..

[106]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[107]  N P Banashree,et al.  OCR for Script Identification of Hindi (Devnagari) Numerals using Feature Sub Selection by Means of End-Point with Neuro-Memetic Model , 2007 .

[108]  Kevin M. Passino,et al.  Biomimicry of bacterial foraging for distributed optimization and control , 2002 .

[109]  K. Mohiuddin International Conference On Document Analysis and Recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[110]  Sameer Antani,et al.  Gujarati character recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).