Complementary approaches built as web service for arabic handwriting OCR systems via amazon elastic mapreduce (EMR) model

Arabic Optical Character Recognition (OCR) as Web Services represents a major challenge for handwritten document recognition. A variety of approaches, methods, algorithms and techniques have been proposed in order to build powerful Arabic OCR web services. Unfortunately, these methods could not succeed in achieving this mission in case of large quantity Arabic handwritten documents. Intensive experiments and observations revealed that some of the existing approaches and techniques are complementary and can be combined to improve the recognition rate. Designing and implementing these recent sophisticated complementary approaches and techniques as web services are commonly complex; they require strong computing power to reach an acceptable recognition speed especially in case of large quantity documents. One of the possible solutions to overcome this problem is to benefit from distributed computing architectures such as cloud computing. This paper describes the design and implementation of Arabic Handwriting Recognition as a web service (AHRweb service) based on the complementary approach K-Nearest Neighbor (KNN) /Support Vector Machine (SVM) (K-NN/SVM) via Amazon Elastic Map Reduce (EMR) model. The experiments were conducted on a cloud computing environment with a real large scale handwriting dataset from the Institut Für Nachrichtentechnik (IFN)/ Ecole Nationale d’Ingénieur de Tunis (ENIT) IFN/ENIT database. The J-Sim (Java Simulator) was used as a tool to generate and analyze statistical results. Experimental results show that Amazon Elastic Map Reduce (EMR) model constitutes a very promising framework for enhancing large Arabic Handwriting Recognition (AHR) web service performances.

[1]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Sargur N. Srihari,et al.  Statistical characterization of handwriting characteristics using automated tools , 2011, Electronic Imaging.

[3]  Abbes Amira,et al.  An Efficient FPGA Implementation of Gaussian Mixture Models-Based Classifier Using Distributed Arithmetic , 2006, 2006 13th IEEE International Conference on Electronics, Circuits and Systems.

[4]  Hyeran Byun,et al.  A Survey on Pattern Recognition Applications of Support Vector Machines , 2003, Int. J. Pattern Recognit. Artif. Intell..

[5]  Ahmet Sayar,et al.  Big Data Frameworks for Efficient Range Queries to Extract Interested Rectangular Sub Regions , 2015 .

[6]  Fouad Khelifi,et al.  A new approach for off-line handwritten Arabic word recognition using KNN classifier , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[7]  Harjit Singh Current Trends in Cloud Computing A Survey of Cloud Computing Systems , 2012 .

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  Byron L. D. Bezerra,et al.  A KNN-SVM hybrid model for cursive handwriting recognition , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[10]  Abdelfettah Belghith,et al.  Towards a distributed Arabic OCR based on the DTW algorithm: performance analysis , 2009, Int. Arab J. Inf. Technol..

[11]  Mahadev Satyanarayanan,et al.  Towards wearable cognitive assistance , 2014, MobiSys.

[12]  Nazlia Omar,et al.  Arabic text classification using k-nearest neighbour algorithm , 2015, Int. Arab J. Inf. Technol..

[13]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[14]  Patrick Gallinari,et al.  An hybrid MLP-SVM handwritten digit recognizer , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[15]  Abdelfettah Belghith,et al.  Toward Distributed Cursive Writing OCR Systems Based on a Combination of Complementary Approaches , 2012 .

[16]  Robert Sabourin,et al.  Speeding up the decision making of support vector classifiers , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[17]  Jose M. Alcaraz Calero,et al.  Distributed security for multi-agent systems - review and applications , 2010, IET Inf. Secur..

[18]  Hamdi Hassen,et al.  Arabic Islamic Manuscripts Digitization Based on Hybrid K-NN/SVM Approach and Cloud Computing Technologies , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[19]  Hamdi Hassen,et al.  A Comparative study of Arabic handwritten characters invariant feature , 2012, ArXiv.

[20]  Kongqiao Wang,et al.  Active learning for image retrieval with Co-SVM , 2007, Pattern Recognit..

[21]  G. Roper World survey of Islamic manuscripts , 1992 .

[22]  Hideaki Goto,et al.  An Overview of the WeOCR System and a Survey of its Use , 2007 .

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Sargur N. Srihari Handwritten Address Interpretation: A Task of Many Pattern Recognition Problems , 2000, Int. J. Pattern Recognit. Artif. Intell..

[25]  Brijesh Verma A contour code feature based segmentation for handwriting recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[26]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[27]  Mostafa Moradi,et al.  The Comparison of Data Replication in Distributed Systems , 2011 .

[28]  Tommy W. S. Chow,et al.  Data Reduction for Pattern Recognition and Data Analysis , 2008, Computational Intelligence: A Compendium.

[29]  Jose M. Alcaraz Calero,et al.  Achieving elasticity for cloud MapReduce jobs , 2013, 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet).