Classification of Arabic Writer Based on Clustering Techniques

Arabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generation step are obtained from multiple runs of individual clustering methods for each distance measures. The best results are achieved when intensity, lines slope and their combination set of features are applied. It is demonstrated that different numbers of clusters having good set of features can deliver significant improvements for the handwritten structures clustering.

[1]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Sargur N. Srihari,et al.  Biometric and Forensic Aspects of Digital Document Processing , 2007 .

[3]  Sung-Hyuk Cha,et al.  Individuality of handwriting: a validation study , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  R. Manmatha,et al.  Retrieving Historical Manuscripts using Shape , 2003 .

[5]  Horst Bunke,et al.  Writer identification using text line based features , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Zhenyu He,et al.  Writer identification using global wavelet-based features , 2008, Neurocomputing.

[7]  Vassilis Anastassopoulos,et al.  Morphological waveform coding for writer identification , 2000, Pattern Recognit..

[8]  Thierry Paquet,et al.  A writer identification and verification system , 2005, Pattern Recognit. Lett..

[9]  Bin Zhang,et al.  Transcript mapping for historic handwritten document images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[10]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[11]  Zaher Al Aghbari,et al.  Classification of personal Arabic handwritten documents , 2008 .

[12]  Cong Shen,et al.  Writer identification using Gabor wavelet , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).

[13]  Sung-Hyuk Cha,et al.  Establishing handwriting individuality using pattern recognition techniques , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Robert Sabourin,et al.  Large vocabulary off-line handwritten word recognition , 2002 .

[15]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[16]  Sargur N. Srihari,et al.  Analysis of handwriting individuality using word features , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[17]  Hiroshi Sako,et al.  Handwritten digit recognition: investigation of normalization and feature extraction techniques , 2004, Pattern Recognit..

[18]  Y.Y. Tang,et al.  Chinese handwriting-based writer identification by texture analysis , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[19]  Bin Fang,et al.  Handwriting-based writer identification with complex wavelet transform , 2008, 2008 International Conference on Wavelet Analysis and Pattern Recognition.

[20]  Tieniu Tan,et al.  Biometric personal identification based on handwriting , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  Graham Leedham,et al.  Writer identification using innovative binarised features of handwritten numerals , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[22]  Horst Bunke,et al.  Using HMM based recognizers for writer identification and verification , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[23]  Mohammad Alshayeb,et al.  KHATT: An open Arabic offline handwritten text database , 2014, Pattern Recognit..

[24]  Sung-Hyuk Cha,et al.  Individuality of handwriting. , 2002, Journal of forensic sciences.

[25]  Barrie Gunter,et al.  Graphology and personality: Another failure to validate graphological analysis , 1987 .

[26]  Jakob Sternby On-Line Signature Verification by Explicit Solution to the Point Correspondence Problem , 2004 .