Optical character recognition (OCR) using partial least square (PLS) based feature reduction: an application to artificial intelligence for biometric identification

PurposeIn artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed documents into machine-readable text document. The major purpose of OCR in academia and banks is to achieve a significant performance to save storage space.Design/methodology/approachA novel technique is proposed for automated OCR based on multi-properties features fusion and selection. The features are fused using serially formulation and output passed to partial least square (PLS) based selection method. The selection is done based on the entropy fitness function. The final features are classified by an ensemble classifier.FindingsThe presented method was extensively tested on two datasets such as the authors proposed and Chars74k benchmark and achieved an accuracy of 91.2 and 99.9%. Comparing the results with existing techniques, it is found that the proposed method gives improved performance.Originality/valueThe technique presented in this work will help for license plate recognition and text conversion from a printed document to machine-readable.

[1]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[2]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[3]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[4]  Haiping Lu,et al.  Distance-reciprocal distortion measure for binary document images , 2004, IEEE Signal Processing Letters.

[5]  Amjad Rehman,et al.  Neural networks for document image preprocessing: state of the art , 2014, Artificial Intelligence Review.

[6]  C. Vasantha Lakshmi,et al.  Document image denoising and binarization using Curvelet transform for OCR applications , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[7]  Iping Supriana,et al.  Arabic Character Recognition System Development , 2013 .

[8]  Sukhpreet Singh,et al.  Optical Character Recognition Techniques: A survey , 2013 .

[9]  Sabri A. Mahmoud,et al.  Recognition : A Survey , 2013 .

[10]  Sunil Kumar Singla,et al.  Optical Character Recognition Based Speech Synthesis System Using LabVIEW , 2014 .

[11]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[12]  Yafang Xue,et al.  Optical Character Recognition , 2022 .

[13]  Gaurav Kumar,et al.  A Detailed Review of Feature Extraction in Image Processing Systems , 2014, 2014 Fourth International Conference on Advanced Computing & Communication Technologies.

[14]  Partha Pratim Roy,et al.  Generation of synthetic training data for handwritten Indic script recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[15]  Wael Abd-Almageed,et al.  Feature Selection using Partial Least Squares regression and optimal experiment design , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[16]  P. McGahan,et al.  Robust Constrained Receding Horizon Control for Linear Time-Varying Systems with Delays , 2016 .

[17]  M. N. Ayyaz,et al.  Handwritten Character Recognition Using Multiclass SVM Classification with Hybrid Feature Extraction , 2016 .

[18]  Hassan Foroosh,et al.  Character recognition in natural scene images using rank-1 tensor decomposition , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[19]  Matias Valdenegro-Toro,et al.  Histograms of Stroke Widths for Multi-script Text Detection and Verification in Road Scenes , 2016 .

[20]  Vladan Vukovi,et al.  Efficient character segmentation approach for machine-typed documents , 2017 .

[21]  Tao Li,et al.  An intelligent character recognition method to filter spam images on cloud , 2017, Soft Comput..

[22]  A. K. Sampath,et al.  Decision tree and deep learning based probabilistic model for character recognition , 2017, Journal of Central South University.

[23]  Jiahuan Zhou,et al.  Towards a Unified Compositional Model for Visual Pattern Modeling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Junaid Ali Khan,et al.  Enhanced car number plate recognition (ECNPR) system by improving efficiency in preprocessing steps , 2017, 2017 International Conference on Communication Technologies (ComTech).

[25]  Partha Bhowmick,et al.  A Novel OCR System Based on Rough Set Semi-reduct , 2017, PReMI.

[26]  Abdelghani Souhar,et al.  Handwritten Character Recognition Based on the Specificity and the Singularity of the Arabic Language , 2017, Int. J. Interact. Multim. Artif. Intell..

[27]  Chunheng Wang,et al.  Fisher vector for scene character recognition: A comprehensive evaluation , 2017, Pattern Recognit..

[28]  Pedro M. B. Torres Text Recognition for Objects Identification in the Industry , 2017 .

[29]  Chu-Sing Yang,et al.  Improved local binary pattern for real scene optical character recognition , 2017, Pattern Recognit. Lett..

[30]  A K Sampath,et al.  Fuzzy-based multi-kernel spherical support vector machine for effective handwritten character recognition , 2017 .

[31]  Douglas C. Schmidt,et al.  Taxonomies for Reasoning About Cyber-physical Attacks in IoT-based Manufacturing Systems , 2017, Int. J. Interact. Multim. Artif. Intell..

[32]  Muhammad Younus Javed,et al.  A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection , 2017, EURASIP J. Image Video Process..

[33]  Chunheng Wang,et al.  Multi-order co-occurrence activations encoded with Fisher Vector for scene character recognition , 2017, Pattern Recognit. Lett..

[34]  Muhammad Imran Razzak,et al.  Handwritten Urdu character recognition using one-dimensional BLSTM classifier , 2017, Neural Computing and Applications.

[35]  Sanjeev Agarwal,et al.  Hybrid Models for Offline Handwritten Character Recognition System Without Using any Prior Database Images , 2018 .

[36]  Mudassar Raza,et al.  Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features , 2018, Multimedia Tools and Applications.

[37]  Muhammad Younus Javed,et al.  License number plate recognition system using entropy-based features selection approach with SVM , 2018, IET Image Process..

[38]  Monika Jain,et al.  Handwritten Character Recognition—An Analysis , 2019 .

[39]  Md. Rafsan Jani,et al.  iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. , 2019, Genomics.

[40]  Sanchay Gupta,et al.  Optical Character Recognition on Bank Cheques Using 2D Convolution Neural Network , 2018, Advances in Intelligent Systems and Computing.

[41]  N. Sowri Raja Pillai,et al.  Satellite Image Classification Using Self Organizing Map And Ensemble Classifiers , 2019, 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN).

[42]  Tanzila Saba,et al.  Enhancing fragility of zero-based text watermarking utilizing effective characters list , 2019, Multimedia Tools and Applications.

[43]  Raymond W. Ptucha,et al.  Intelligent character recognition using fully convolutional neural networks , 2019, Pattern Recognit..

[44]  Tassawar Iqbal,et al.  Human Behavior Analysis Based on Multi-Types Features Fusion and Von Nauman Entropy Based Features Reduction , 2019, J. Medical Imaging Health Informatics.

[45]  Manju Mandot,et al.  Template Matching for Automatic Number Plate Recognition System with Optical Character Recognition , 2019, Information and Communication Technology for Sustainable Development.

[46]  Maad M. Mijwil,et al.  Two-Dimensional Optical Character Recognition of Mouse Drawn in Turkish Capital Letters Using Multi-Layer Perceptron Classification , 2019, Journal of Southwest Jiaotong University.

[47]  A. Muhammad,et al.  Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning , 2019, Int. J. Pattern Recognit. Artif. Intell..

[48]  Vahid Ghods,et al.  An efficient character recognition method using enhanced HOG for spam image detection , 2019, Soft Comput..

[49]  Patrick Marques Ciarelli,et al.  Industrial Optical Character Recognition System in Printing Quality Control of Hot-Rolled Coils Identification , 2020 .

[50]  Rodolfo Ipolito Meneguette,et al.  A Real-Time Automatic Plate Recognition System Based on Optical Character Recognition and Wireless Sensor Networks for ITS , 2019, Sensors.

[51]  Ebrahim Al-wajih,et al.  Improving the Accuracy for Offline Arabic Digit Recognition Using Sliding Window Approach , 2020 .

[52]  Dongzhu Feng,et al.  Research on pose estimation for stereo vision measurement system by an improved method: uncertainty weighted stereopsis pose solution method based on projection vector. , 2020, Optics express.

[53]  Muhammad Sharif,et al.  A framework for offline signature verification system: Best features selection approach , 2018, Pattern Recognit. Lett..

[54]  Rizwan Ahmed Khan,et al.  Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR) , 2020, IEEE Access.

[55]  Naoto Yokoya,et al.  Invariant Attribute Profiles: A Spatial-Frequency Joint Feature Extractor for Hyperspectral Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[56]  George Vogiatzis,et al.  An Efficient Industrial System for Vehicle Tyre (Tire) Detection and Text Recognition Using Deep Learning , 2020 .