Skeleton extraction: Comparison of five methods on the Arabic IFN/ENIT database

Thinning “Skeletonization” is a very crucial stage in the Arabic Character Recognition (ACR) system. It simplifies the text shape and reduces the amount of data that needs to be handled and it is usually used as a pre-processing stage for recognition and storage systems. The skeleton of Arabic text can be used for: baseline detection, character segmentation, and features extraction, and ultimately supporting the classification. In this paper, five of the state of the art thinning algorithms are selected and implemented. The five algorithms are: SPTA, Zhang-Suen parallel thinning algorithm, Huang-Wan-Liu thinning algorithm, thinning and skeletonization based morphological operation algorithms. The five selected algorithms are applied on the IFN/ENIT dataset. The results obtained by the five methods are discussed and analyzed against the IFN/ENIT dataset based on preserving shape and the text connectivity, preventing spurious tails, maintaining one pixel width skeleton and avoiding the necking problem as well as running time efficiently. In addition to that some performance measurement for checking text connectivity, spurious tails and calculating the stroke thickness are proposed and carried out.

[1]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[2]  Akram M. Zeki,et al.  The Segmentation Problem in Arabic Character Recognition The State Of The Art , 2005 .

[3]  C. J. Hilditch Comparison of thinning algorithms on a parallel processor , 1983, Image Vis. Comput..

[4]  Stanley S. Ipson,et al.  A novel triangulation procedure for thinning hand-written text , 2001, Pattern Recognit. Lett..

[5]  Mokhtar Sellami,et al.  Arabic Handwritten Word Recognition Using HMMs with Explicit State Duration , 2007, EURASIP J. Adv. Signal Process..

[6]  Khairuddin Omar,et al.  A comparative study between methods of Arabic baseline detection , 2009, 2009 International Conference on Electrical Engineering and Informatics.

[7]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Lei Huang,et al.  An improved parallel thinning algorithm , 2003, 2016 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR).

[9]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Mostafa G. M. Mostafa,et al.  An Adaptive Algorithm for the Automatic Segmentation of Printed Arabic Text , 2004 .

[11]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[12]  Nabil Jean Naccache,et al.  SPTA: A proposed algorithm for thinning binary patterns , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Fiaz Hussain,et al.  Thinning Arabic characters for feature extraction , 2001, Proceedings Fifth International Conference on Information Visualisation.