Optical character recognition using artificial neural network

The objective of this work is to convert printed text or handwritten characters recorded offline using either scanning equipment or cameras into a machine-usable text by simulating a neural network so that it would improve the process of collecting and storing data by human workers. Another goal is to provide an alternate, better and faster algorithm with higher accuracy to recognize the characters. In this context, we choose artificial neural network and make it much more tolerant to anomalies in the recorded image or data. Common optical character recognition tasks involve identifying simple edge detection and matching them with predefined patterns. In this research, characters are recognized even when noise such as inclination and skewedness presents, by training the network to look for discrepancies in data and relate them using vocabulary, grammar and common recurrences that may occur after a character. Images are also masked in multiple ways and processed individually to increase the confidence level of prediction.

[1]  Jihad El-Sana,et al.  The Influence of Language Orthographic Characteristics on Digital Word Recognition , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[2]  Jihad El-Sana,et al.  The Influence of Language Orthographic Characteristics on Digital Word Recognition , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[3]  S. Himavathi,et al.  Diagonal Based Feature Extraction for Handwritten Alphabets Recognition System using Neural Network , 2011, ArXiv.

[4]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  T. K. Das,et al.  A customer classification prediction model based on machine learning techniques , 2015, 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[6]  Manuel Perea,et al.  The effects of orthographic neighborhood in reading and laboratory word identification tasks: A review , 2000 .

[7]  M. Brysbaert,et al.  Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project , 2006, Psychonomic bulletin & review.

[8]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[9]  Nicolas Ragot,et al.  Adaptive detection of missed text areas in OCR outputs: application to the automatic assessment of OCR quality in mass digitization projects , 2013, Electronic Imaging.

[10]  Kengo Terasawa,et al.  Word Spotting for Historical Document Images with Eigenspace Methods and DTW , 2006 .

[11]  T. K. Das,et al.  Intelligent Techniques in Decision Making: A Survey , 2016 .

[12]  Thomas A. Nartker,et al.  Prediction of OCR accuracy using simple image features , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.