Arabic Islamic Manuscripts Digitization Based on Hybrid K-NN/SVM Approach and Cloud Computing Technologies

In many national libraries and archive centers, most of Islamic Manuscripts are still in their initial form and not digitized yet. These documents are indeed very rich in knowledge and constitute a part of the heritage of Muslims. The main reasons behind the digitization of these Islamic Manuscripts are, consequently and at least, to enhance and ease their accessibility by people who are interesting in order to exploit the corresponding knowledge and to improve their durability. Such a mission is not yet easy to achieve because of the weaknesses of the existing approaches and algorithms. Indeed, several researchers around the world have proposed a variety of approaches, algorithms and techniques in order to build a powerful system able to digitize such documents. Unfortunately, these efforts didn't succeed since they realized that such problem cannot be solved without the integration and cooperation of several strong complementary approaches, algorithms and techniques at the same time. But, such an idea and a system requires at the same time the knowledge of the good complementary approaches, algorithms and techniques which can lead to an acceptable recognition rate of the Arabic handwriting in one hand, and the adequate hardware infrastructure which can host such complex and greedy software to achieve the mission in a reasonable time on the other hand. Our idea consists on: in the first hand, to consider cloud computing as an infrastructure (IaaS) to deploy our combination of algorithms K-NN/SVM for Arabic Islamic Manuscripts Recognition System AIMRS. In the second hand, to consider cloud Storage as a Service (SaaS) to store and retrieve large amounts of Arabic Islamic Manuscripts. Our approach provides indeed an adequate platform for the expected powerful digitization system based on the integration and cooperation of some strong complementary approaches. In addition, our approach offers a number of benefits, such as the ability to store and retrieve large amounts of Islamic Manuscripts, the fast processing, the fast data access, and the unlimited storage.

[1]  Xia Shaowei,et al.  Support vector machine and its application in handwritten numeral recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Hamdi Hassen,et al.  A Comparative study of Arabic handwritten characters invariant feature , 2012, ArXiv.

[3]  Abdelfettah Belghith,et al.  Towards a distributed Arabic OCR based on the DTW algorithm: performance analysis , 2009, Int. Arab J. Inf. Technol..

[4]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[5]  Radu Prodan,et al.  A survey and taxonomy of infrastructure as a service and web hosting cloud providers , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[6]  Bin Zhao,et al.  Support Vector Machine and its Application in Handwritten Numeral Recognition , 2000, ICPR.

[7]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  Byron L. D. Bezerra,et al.  A KNN-SVM hybrid model for cursive handwriting recognition , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[11]  Mostafa Moradi,et al.  The Comparison of Data Replication in Distributed Systems , 2011 .

[12]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[13]  Abdel Belaïd,et al.  Handwriting recognition using local methods for normalization and global methods for recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Loo-Nin Teow,et al.  Robust vision-based features and classification schemes for off-line handwritten digit recognition , 2002, Pattern Recognit..