Fractal-based system for Arabic/Latin, printed/handwritten script identification

In this paper, we present multilingual automatic identification of Arabic and Latin in both handwritten and printed script. The proposed scheme is based, Firstly, on morphological transform of line text images, secondly on fractal analysis features of both (i): original texture of 2-D images, (ii): vertical and horizontal profile projection. We used two techniques to obtain only 12 features based on fractal multi-dimension. The proposed system has been tested for 1000 prototypes with various typefaces, scriptors styles and sizes. The accuracy discrimination rate is about of 96.64 % by using KNN, and 98.72 % by using RBF. Experimental results show the importance of the proposed approach.

[1]  P. Nagabhushan,et al.  Script Identification Based on Morphological Reconstruction in Document Images , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  Adel M. Alimi,et al.  Can fractal dimension be used in font classification , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Jayanthi Sivaswamy,et al.  A generalised framework for script identification , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[4]  A. M. Alimi,et al.  A neuro-fuzzy approach to recognize Arabic handwritten characters , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[5]  Adel M. Alimi,et al.  Three decision levels strategy for Arabic and Latin texts differentiation in printed and handwritten natures , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[6]  Adel M. Alimi,et al.  Building Diverse Classifier Outputs to Evaluate the Behavior of Combination Methods: The Case of Two Classifiers , 2004, Multiple Classifier Systems.

[7]  Tieniu Tan,et al.  Rotation Invariant Texture Features and Their Use in Automatic Script Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Adel M. Alimi,et al.  Simulating Classifier Outputs for Evaluating Parallel Combination Methods , 2003, Multiple Classifier Systems.

[9]  Shijian Lu,et al.  Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Xianyi Zeng,et al.  Design and implementation of an estimator of fractal dimension using fuzzy techniques , 2001, Pattern Recognit..

[11]  Bidyut Baran Chaudhuri,et al.  Identification of different script lines from multi-script documents , 2002, Image Vis. Comput..

[12]  Patrick Kelly,et al.  Automatic script identification from images using cluster-based templates , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[13]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Adel M. Alimi,et al.  Script and nature differentiation for Arabic and Latin text images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[15]  Henry S. Baird,et al.  Language identification in Complex, Unoriented, and Degraded Document Images , 1996, DAS.

[16]  Guiyun Zhou,et al.  A comparison of fractal dimension estimators based on multiple surface generation algorithms , 2005, Comput. Geosci..

[17]  Adel M. Alimi,et al.  An evolutionary neuro-fuzzy approach to recognize on-line Arabic handwriting , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[18]  Patrick Kelly,et al.  Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Sridha Sridharan,et al.  Texture for script identification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.