Identification of unreliable segments to improve skeletonization of handwriting images

An unavoidable problem of most existing skeletonization algorithms for handwriting images is the production of undesired artifacts or pattern distortions. This paper presents a method of identifying these unreliable segments to improve the skeletons of handwriting images. In this method, a novel feature called iteration time is exploited, by which each unreliable segment can be treated as a set of points with exceptional iteration times. First, the iteration time of each skeleton point is calculated, and an undirected graph is built from the skeleton whose edges are weighted by defining a distance measurement between each pair of connected nodes based on iteration time. Then the set of unreliable segments is achieved by a graph clustering algorithm with an effective clustering quality function. Finally, the probability of two jointed reliable segments belonging to a continuous pair is estimated by a best-matched method, and a cubic B-spline interpolation is applied to reconstruct unreliable parts of the skeleton. Experimental results show that the proposed method can detect unreliable segments effectively and produce a skeleton that is closer to the original writing trajectory.

[1]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[4]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5]  Ju Jia Zou,et al.  Triangle refinement in a constrained Delaunay triangulation skeleton , 2007, Pattern Recognit..

[6]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[7]  Stefan Jäger,et al.  Recovering writing traces in off-line handwriting recognition: using a global optimization technique , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  Claudio M. Privitera,et al.  The segmentation of cursive handwriting: an approach based on off-line recovery of the motor-temporal information , 1999, IEEE Trans. Image Process..

[9]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[10]  Yu Qiao,et al.  Recovering dynamic information from static handwritten images , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[11]  Yuan Yan Tang,et al.  Skeletonization of Ribbon-Like Shapes Based on a New Wavelet Function , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Chungnan Lee,et al.  A Chinese-character-stroke-extraction algorithm based on contour information , 1998, Pattern Recognit..

[13]  S. Dongen Graph clustering by flow simulation , 2000 .

[14]  Lei Huang,et al.  An improved parallel thinning algorithm , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[15]  Jun S. Huang,et al.  Stroke segmentation by bernstein-bezier curve fitting , 1990, Pattern Recognit..

[16]  Kuo-Chin Fan,et al.  A run-length coding based approach to stroke extraction of Chinese characters , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[17]  Yuan Yan Tang,et al.  Wavelet-Based Approach to Character Skeleton , 2007, IEEE Transactions on Image Processing.

[18]  Ching Y. Suen,et al.  Identification of Fork Points on the Skeletons of Handwritten Chinese Characters , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Réjean Plamondon,et al.  Thinning and segmenting handwritten characters by line following , 1992, Machine Vision and Applications.

[20]  Hong Yan,et al.  Skeletonization of ribbon-like shapes based on regularity and singularity analyses , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[22]  Sukhan Lee,et al.  Offline tracing and representation of signatures , 1992, IEEE Trans. Syst. Man Cybern..

[23]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[24]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[25]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[26]  P. K. Rhee,et al.  An efficient fully parallel thinning algorithm , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.