Classification of Chinese Characters Using Pseudo Skeleton Features

In this paper we present a novel method to classify machine printed Chinese characters by matching the code strings generated from pseudo skeleton features. In our approach, the pseudo skeletons of Chinese characters are extracted rather than using skeletons extracted by traditional thinning algorithms. The features of the pseudo skeletons of both input and template characters are then encoded into two code strings. Finally, the edit-distance algorithm is employed to compute the similarity between the two characters based on their corresponding encoded strings. The main contribution of this paper is to effectively classify multi-fonts Chinese characters using a single-font reference database. Experiments were conducted on 5401 daily-used Chinese characters of various fonts and sizes. Experimental results demonstrate the validity and efficiency of our proposed method for classifying Chinese characters.

[1]  Yung-Sheng Chen,et al.  A modified fast parallel algorithm for thinning digital patterns , 1988, Pattern Recognit. Lett..

[2]  Yves Lecourtier,et al.  A structural/statistical feature based vector for handwritten character recognition , 1998, Pattern Recognit. Lett..

[3]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Anil K. Jain,et al.  Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[5]  Chin-Chuan Han,et al.  Coarse classification of Chinese characters via stroke clustering method , 1995, Pattern Recognit. Lett..

[6]  KUO-CHIN FAN,et al.  Skeletonization of binary images with nonuniform width via block decomposition and contour vector matching , 1998, Pattern Recognit..

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  Roland T. Chin,et al.  One-Pass Parallel Thinning: Analysis, Properties, and Quantitative Evaluation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  George Nagy,et al.  At the frontiers of OCR , 1992, Proc. IEEE.

[10]  Claudio De Stefano,et al.  Character preclassification based on genetic programming , 2002, Pattern Recognit. Lett..

[11]  T. Pavlidis A thinning algorithm for discrete binary images , 1980 .

[12]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[13]  Mario Vento,et al.  Combining statistical and structural approaches for handwritten character description , 1999, Image Vis. Comput..