论文信息 - OCR for Telugu Script Using Back-Propagation Based Classifier

OCR for Telugu Script Using Back-Propagation Based Classifier

This paper deals with the theory and implementation of an Optical Character Recognition (OCR) system for printed Telugu script, which exploits the inherent characteristics of Telugu scripts, one of the major scheduled language of India, spoken by more than 66 million people, especially in South India. The principle idea is to convert images of text documents such as those obtained from scanning a document into editable text. The system consider a images as input, separates the lines, words and then characters step by step and then recognizes the character using artificial neural network approach, in which creating a character matrix and a corresponding suitable network structure is key. The features detection methods are simple and robust. The various features that are considered for classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), number of slope lines, special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to neural network where the characters are classified by supervised learning of Back Propagation algorithm which compromises training, calculation of error, and modifying weights and then testing the given image. These classes are mapped onto Unicode for recognition. Once the characters are recognized they can be replaced by the standard fonts to integrate information from diverse sources.

Mandeep Kaur | Rinki Singh | M. Kaur | Rinki Singh

[1] C. Patvardhan,et al. A high accuracy OCR system for printed Telugu text , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[2] Gail Hodge. CENDI Analysis of Scanning/Optical Character Recognition Position Descriptions. , 1997 .

[3] Mohammed A. Otair,et al. Online Handwritten Character Recognition Using an Optical Backpropagation Neural Network , 2005 .

[4] R. Jagadeesh Kannan,et al. A Comparative Study of Optical Character Recognition for Tamil Script , 2005 .

[5] R. Seethalakshmi,et al. Optical Character Recognition for printed Tamil text using Unicode , 2005 .

[6] Sameer Singh,et al. Optical Character Recognition: Neural Network Analysis of Hand-Printed Characters , 1998, SSPR/SPR.

[7] Adnan Md. Shoeb Shatil. Research report on Bangla optical character recognition using Kohonen network , 2007 .

[8] Arun K. Pujari,et al. An Adaptive Character Recognizer for Telugu Scripts Using Multiresolution Analysis, Associative Memory , 2002, ICVGIP.

[9] Ahmad M. Sarhan,et al. Arabic Character Recognition using Artificial Neural Networks and Statistical Analysis , 2007 .

[10] Dong-Sik Jang,et al. Optical Character Recognition System Using BP Algorithm , 2008 .