Urdu Character Recognition using Principal Component Analysis

This paper proposes a method for Urdu language text search in image based Urdu Text. In the proposed method two databases of images have been created; first one for training purpose and another for testing purpose. Training database is named ‘TrainDatabase’ and testing database as ‘TestDatabase’. Training database consists of all characters of Urdu language in different shapes. Eigen values and Eigen vectors of all the images to be placed in the TrainingDatabase are calculated. Only those values having highest Eigen values are kept. A feature vector for each image of the TrainDatabase is calculated by the algorithm. A threshold value is chosen such that it defines maximum allowable distance between TrainDatabase and TestDatabase images. Feature vector is also created for each image to be identified and placed in ‘TestDatabase’. Comparison is done for a character to be identified with each image of ‘TrainDatabase’. If the character to be recognized is matching with any character of the TrainDatabase result is shown by algorithm. MATLAB has been used as a simulation tool and the recognition rate obtained was 96.2 % for isolated characters. General Terms Pattern Recognition, Optical Character Recognition.

[1]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[2]  Sarmad Hussain,et al.  Improving Nastalique specific pre-recognition process for Urdu OCR , 2009, 2009 IEEE 13th International Multitopic Conference.

[3]  Sangita M. Rajput,et al.  Classification of EEG using PCA, ICA and Neural Network , 2012 .

[4]  S. Prasad,et al.  A Face Recognition Using PCA and Feed Forward Neural Networks , 2011 .

[5]  Awais Adnan,et al.  OCR For Printed Urdu Script Using Feed Forward Neural Network , 2007 .

[6]  U. Pal,et al.  Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Srikanta Patnaik,et al.  Optical Character Recognition System for Urdu (Naskh Font) Using Pattern Matching Technique , 2009 .

[8]  Xianglong Tang,et al.  Online handwritten English word recognition based on cascade connection of character HMMs , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[9]  Z. A. Shah,et al.  Ligature based optical character recognition of Urdu- Nastaleeq font , 2002 .

[10]  G. Nagy,et al.  Chinese character recognition: a twenty-five-year retrospective , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[11]  V. Rao Vemuri,et al.  An application of principal component analysis to the detection and visualization of computer network attacks , 2006, Ann. des Télécommunications.

[12]  Awais Adnan,et al.  Urdu Nastaleeq Optical Character Recognition , 2007 .

[13]  Tsuyoshi Kitani,et al.  Pattern Matching In The Textract Information Extraction System , 1994, COLING.

[14]  Fareeha Anwar,et al.  Relative Magnitude of Gaussian Curvature Using Neural Network and Object Rotation of Two Degrees of Freedom , 2007, MVA.

[15]  Talaat S. El-Sheikh,et al.  Computer recognition of arabic cursive scripts , 1988, Pattern Recognit..

[16]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[17]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.