IDENTIFICATION OF DEVANAGARI SCRIPT FROM IMAGE DOCUMENT

Texts that appear in the image contain useful and important information. Optical Character Recognition technology is restricted to finding text printed against clean backgrounds, and cannot handle text printed against shaded or textured backgrounds or embedded in images. It is necessary to extract the text form image which is helpful in a society for a blind and visually impaired person when voice synthesizer is attached with the system. In this paper, we present a methodology for extracting text from printed image document and then identified Devanagari Script (Hindi language) from extracted text. Firstly we used Morphological Approach for extracting the text from image documents. The resultant text image is passed to Optical Character Recognition for Identification purpose. Projection profile is used for segmentation followed by Visual Discriminating approach for feature extraction. Finally for classification purpose Heuristic search is used. The result of proposed method for text extraction is compared with edge based and connected component with projection profile approach. After comparison using precision and recall rate it is observed that proposed algorithm work well.

[1]  M. M. Kodabagi,et al.  CHARACTER RECOGNITION OF KANNADA TEXT IN SCENE IMAGES USING NEURAL NETWORK , 2013 .

[2]  J. Samarabandu,et al.  An edge-based text region extraction algorithm for indoor mobile robot navigation , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[3]  M. M. Kodabagi,et al.  TEXT REGION EXTRACTION FROM LOW RESOLUTION DISPLAY BOARD IMAGES USING WAVELET FEATURES , 2013 .

[4]  N. V. Subbareddy,et al.  Neural network based system for script identification in Indian documents , 2002 .

[5]  V. Banga,et al.  Image Segmentation for Text Extraction , 2012 .

[6]  M. C. Padma,et al.  Script Identification from Trilingual Documents using Profile Based Features , 2010, Int. J. Comput. Sci. Appl..

[7]  Verónica Vilaplana,et al.  Caption text extraction for indexing purposes using a hierarchical region-based image model , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[8]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  U. Pal,et al.  Neural network based word-wise handwritten script identification system for Indian postal automation , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[10]  R. Edbert Rajan,et al.  SPATIAL AND HIERARCHICAL FEATURE EXTRACTION BASED ON SIFT FOR MEDICAL IMAGES , 2012 .

[11]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[12]  M. M. Kodabagi,et al.  SCRIPT IDENTIFICATION FROM PRINTED DOCUMENT IMAGES USING STATISTICAL FEATURES , 2013 .

[13]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[15]  Peter Kulchyski and , 2015 .