Offline Recognition of Devanagari Script: A Survey

In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

[1]  Amit Kumar Das,et al.  A fast algorithm for skew detection of document images using morphology , 2001, International Journal on Document Analysis and Recognition.

[2]  Venu Govindaraju,et al.  Challenges in OCR of Devanagari documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Ujjwal Bhattacharya,et al.  Devanagari and Bangla Text Extraction from Natural Scene Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[5]  Venu Govindaraju,et al.  Tools for enabling digital access to multi-lingual Indic documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[6]  Bidyut Baran Chaudhuri,et al.  Skew Angle Detection of Digitized Indian Script Documents , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Bidyut Baran Chaudhuri,et al.  OCR Error Correction of an Inflectional Indian Language Using Morphological Parsing , 2000, J. Inf. Sci. Eng..

[8]  Bidyut Baran Chaudhuri,et al.  Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Venu Govindaraju,et al.  Keyword Spotting Techniques for Sanskrit Documents , 2008, Sanskrit Computational Linguistics.

[10]  U. Pal,et al.  Multi-script line identification from Indian documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Bidyut Baran Chaudhuri,et al.  Word-Wise Script Identification from Indian Documents , 2004, Document Analysis Systems.

[12]  Pooja Agrawal,et al.  Segmentation of Handwritten Hindi Text: A Structural Approach , 2009, Int. J. Comput. Process. Orient. Lang..

[13]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Tetsushi Wakabayashi,et al.  Accuracy Improvement of Devnagari Character Recognition Combining SVM and MQDF , 2008 .

[15]  Ishwar K. Sethi,et al.  Machine recognition of constrained hand printed devanagari , 1977, Pattern Recognit..

[16]  Utpal Garain,et al.  Recognition of Handwritten Indic Script Using Clonal Selection Algorithm , 2006, ICARIS.

[17]  Umapada Pal,et al.  Multi-oriented Bangla and Devnagari text recognition , 2010, Pattern Recognit..

[18]  Gaurav Harit,et al.  Devising interactive access techniques for Indian language document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[19]  Pramod Sharma,et al.  A Robust OCR for Degraded Documents , 2008 .

[20]  Venu Govindaraju,et al.  Creation of data resources and design of an evaluation test bed for Devanagari script recognition , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[21]  Vanita Mane,et al.  Handwritten character recognition using elastic matching and PCA , 2009, ICAC3 '09.

[22]  Fumitaka Kimura,et al.  Indian Multi-Script Full Pin-code String Recognition for Postal Automation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[23]  Mita Nasipuri,et al.  A Two Stage Classification Approach for Handwritten Devnagari Characters , 2010, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[24]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[25]  Premkumar Natarajan,et al.  The BBN Byblos Hindi OCR system , 2005, IS&T/SPIE Electronic Imaging.

[26]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[27]  Bidyut Baran Chaudhuri,et al.  Machine-printed and hand-written text lines identification , 2001, Pattern Recognit. Lett..

[28]  M. C. Padma,et al.  Text Line Identification from a Multilingual Document , 2009, 2009 International Conference on Digital Image Processing.

[29]  Amit Dhurandhar,et al.  Robust Pattern Recognition Scheme for Devanagari Script , 2005, CIS.

[30]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[31]  C. V. Jawahar,et al.  Digitizing a Million Books: Challenges for Document Analysis , 2006, Document Analysis Systems.

[32]  Madasu Hanmandlu,et al.  Fuzzy model based recognition of handwritten numerals , 2007, Pattern Recognit..

[33]  C. V. Jawahar,et al.  Building Data Sets for Indian Language OCR Research , 2009 .

[34]  Changsong Liu,et al.  Gabor filters-based feature extraction for character recognition , 2005, Pattern Recognit..

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  A. G. Ramakrishnan,et al.  Word level multi-script identification , 2008, Pattern Recognit. Lett..

[37]  Veena Bansal,et al.  Partitioning and searching dictionary for correction of optically read Devanagari character strings , 2002, International Journal on Document Analysis and Recognition.

[38]  Venu Govindaraju,et al.  Design and comparison of segmentation driven and recognition driven Devanagari OCR , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[39]  Kaushik Roy,et al.  Trilingual Script Separation of Handwritten Postal Document , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[40]  Tetsushi Wakabayashi,et al.  Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[41]  Latesh G. Malik,et al.  Fine Classification & Recognition of Hand Written Devnagari Characters with Regular Expressions & Minimum Edit Distance Method , 2008, J. Comput..

[42]  C. V. Jawahar,et al.  Retrieval from Document Image Collections , 2006, Document Analysis Systems.

[43]  U. Pal,et al.  A system for word-wise handwritten script identification for Indian postal automation , 2004, Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004..

[44]  Fumitaka Kimura,et al.  Shape Code Based Word-Image Matching for Retrieval of Indian Multi-lingual Documents , 2010, 2010 20th International Conference on Pattern Recognition.

[45]  Vivek Singhal,et al.  Script-based classification of hand-written text documents in a multilingual environment , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[46]  Subhadip Basu,et al.  A novel framework for automatic sorting of postal documents with multi-script address blocks , 2010, Pattern Recognit..

[47]  G. G. Rajput,et al.  Fourier Descriptor based Isolated Marathi Handwritten Numeral Recognition , 2010 .

[48]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[49]  S. Mehrotra,et al.  Feature Extraction Based on Moment Invariants for Handwriting Recognition , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[50]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[51]  Bidyut Baran Chaudhuri,et al.  A system towards Indian postal automation , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[52]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[53]  P. P. Rege,et al.  Devanagari handwritten numeral identification based on Zernike moments , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[54]  Veena Bansal Integrating Knowledge Sources in Devanagari Text Recognition , 1999 .

[55]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[56]  Robert M. Haralick,et al.  A Statistical, Nonparametric Methodology for Document Degradation Model Validation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Mahantapas Kundu,et al.  Study of Different Features on Handwritten Devnagari Character , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[58]  Georgios Louloudis,et al.  ICDAR 2009 Handwriting Segmentation Contest , 2009, ICDAR.

[59]  Madasu Hanmandlu,et al.  Fuzzy Model Based Recognition of Handwritten Hindi Numerals using Bacterial Foraging , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[60]  Malayappan Shridhar,et al.  Offline Handwritten Devanagari Word Recognition: A Holistic Approach Based on Directional Chain Code Feature and HMM , 2008, 2008 International Conference on Information Technology.

[61]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[62]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[63]  R. Mahesh K. Sinha,et al.  A Journey from Indian Scripts Processing to Indian Language Processing , 2009, IEEE Annals of the History of Computing.

[64]  C. V. Jawahar,et al.  Matching word images for content-based retrieval from printed document images , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[65]  T. R. Sontakke,et al.  Rotation, scale and translation invariant handwritten Devanagari numeral character recognition using general fuzzy neural network , 2007, Pattern Recognit..

[66]  Bidyut Baran Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[67]  Pramod Kumar Sharma,et al.  A Rule Based Approach for Skew Correction and Removal of Insignificant Data from Scanned Text Documents of Devanagari Script , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[68]  Madasu Hanmandlu,et al.  Input Fuzzy Modeling for the Recognition of Handwritten Hindi Numerals , 2007, Fourth International Conference on Information Technology (ITNG'07).

[69]  C. V. Jawahar,et al.  Tools for Developing OCRs for Indian Scripts , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[70]  P. S. Hiremath,et al.  Script identification in a handwritten document image using texture features , 2010, 2010 IEEE 2nd International Advance Computing Conference (IACC).

[71]  Deepak Bagai,et al.  Skew angle detectionof a cursive handwritten Devanagari script character image. , 2013 .

[72]  Bidyut Baran Chaudhuri,et al.  Multi-skew detection of Indian script documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[73]  Bidyut Baran Chaudhuri,et al.  On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis , 2002, AFSS.

[74]  Reda Alhajj,et al.  Multiple-agents to identify and separate touching digits in unconstrained handwritten Hindi numerals , 2003, J. Exp. Theor. Artif. Intell..

[75]  Pooja Agrawal,et al.  Coarse Classification of Handwritten Hindi Characters , 2009 .

[76]  Tetsushi Wakabayashi,et al.  Off-Line Handwritten Character Recognition of Devnagari Script , 2007 .

[77]  Umapada Pal,et al.  Morphology Based Handwritten Line Segmentation Using Foreground and Background Information , 2008 .

[78]  Mahantapas Kundu,et al.  Recognition of Non-Compound Handwritten Devnagari Characters using a Combination of MLP and Minimum Edit Distance , 2010, ArXiv.

[79]  Umapada Pal,et al.  Two-stage Approach for Word-wise Script Identification , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[80]  Naresh Kumar Garg,et al.  A New Method for Line Segmentation of Handwritten Hindi Text , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[81]  Bidyut Baran Chaudhuri,et al.  2009 10th International Conference on Document Analysis and Recognition Handwritten Text Line Identification In Indian Scripts , 2022 .

[82]  Fumitaka Kimura,et al.  Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier , 2006, ICVGIP.

[83]  Malayappan Shridhar,et al.  A Segmentation Based Approach to Offline Handwritten Devanagari Word Recognition , 2008, 2008 International Conference on Information Technology.

[84]  Ioannis Pratikakis,et al.  Text line detection in handwritten documents , 2008, Pattern Recognit..

[85]  Santanu Chaudhury,et al.  Devnagari numeral recognition by combining decision of multiple connectionist classifiers , 2002 .

[86]  B. Chatterjee,et al.  Machine Recognition of Hand-printed Devnagri Numerals , 1976 .

[87]  A. G. Ramakrishnan,et al.  A Blind Indic Script Recognizer for Multi-script Documents , 2007 .

[88]  Madasu Hanmandlu,et al.  Unconstrained handwritten character recognition based on fuzzy logic , 2003, Pattern Recognit..

[89]  Venu Govindaraju,et al.  Devanagari OCR using a recognition driven segmentation framework and stochastic language models , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[90]  Veena Bansal,et al.  Integrating knowledge sources in Devanagari text recognition system , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[91]  Satish Kumar An Analysis of Irregularities in Devanagari Script Writing – A Machine Recognition Perspective , 2010 .

[92]  Mudit Agrawal,et al.  Generalization of Hindi OCR Using Adaptive Segmentation and Font Files , 2009 .

[93]  Veena Bansal,et al.  A complete OCR for printed Hindi text in Devanagari script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[94]  R. Mahesh K. Sinha,et al.  Rule based contextual post-processing for devanagari text recognition , 1987, Pattern Recognit..

[95]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[96]  D. S. Guru,et al.  Appearance Based Models in Document Script Identification , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[97]  Changsong Liu,et al.  Optimized Gabor filter based feature extraction for character recognition , 2002, Object recognition supported by user interaction for service robots.

[98]  C. Vasantha Lakshmi,et al.  Handwritten Devnagari Numerals Recognition with Higher Accuracy , 2008 .

[99]  Satish Kumar,et al.  Performance Comparison of Features on Devanagari Hand-printed Dataset , 2009 .

[100]  David S. Doermann,et al.  Adaptive Hindi OCR using generalized Hausdorff image comparison , 2003, TALIP.

[101]  Ashraf Elnagar,et al.  Recognition of handwritten Hindu numerals using structural descriptors , 2003, J. Exp. Theor. Artif. Intell..

[102]  Tetsushi Wakabayashi,et al.  Handwritten Numeral Recognition of Six Popular Indian Scripts , 2007 .

[103]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[104]  Ken'iti Kido,et al.  Devanagari character recognition using structure analysis , 1989, Fourth IEEE Region 10 International Conference TENCON.

[105]  Tetsushi Wakabayashi,et al.  F-ratio Based Weighted Feature Extraction for Similar Shape Character Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[106]  S. Harous,et al.  Recognition of handwritten Hindi numerals using structural descriptors , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.