Recognition of printed Devanagari text using BLSTM Neural Network

In this paper, we propose a recognition scheme for the Indian script of Devanagari. Recognition accuracy of Devanagari script is not yet comparable to its Roman counterparts. This is mainly due to the complexity of the script, writing style etc. Our solution uses a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM). Our approach does not require word to character segmentation, which is one of the most common reason for high word error rate. We report a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

[1]  C. V. Jawahar,et al.  BLSTM Neural Network Based Word Retrieval for Hindi Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Volkmar Frinken,et al.  Adapting BLSTM Neural Network Based Keyword Spotting Trained on Modern Data to Historical Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[3]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[5]  Volkmar Frinken,et al.  A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks , 2010, ANNPR.

[6]  C. V. Jawahar,et al.  Robust Recognition of Degraded Documents Using Character N-Grams , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[7]  C. V. Jawahar,et al.  Nearest neighbor based collection OCR , 2010, DAS '10.

[8]  Premkumar Natarajan,et al.  The BBN Byblos Hindi OCR system , 2005, IS&T/SPIE Electronic Imaging.

[9]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Venu Govindaraju,et al.  Guide to OCR for Indic Scripts , 2010 .

[11]  C. V. Jawahar,et al.  Experiences of integration and performance testing of multilingual OCR for printed Indian scripts , 2011, MOCR_AND '11.