An EMD-based recognition method for Chinese fonts and styles

This paper presents a novel method to recognize Chinese fonts based on empirical mode decomposition (EMD). By analyzing and comparing a great number of Chinese characters, five basic strokes have been selected to characterize the stroke features of Chinese fonts. Based on them, stroke feature sequences of a given text block are calculated. By decomposing them with EMD, some intrinsic mode functions are produced and then the first two, which are of the highest frequencies, are used to produce the so-called stroke high frequency energies, which is the average energy of the two intrinsic mode functions over the length of the sequence. By calculating the stroke high frequency energies for all the five basic strokes and combining them with the averages of the five residues, which are called stroke low frequency energies, a 10-dimensional feature vector is formed. Finally, the minimum distance classifier is used to recognize the fonts and encouraging experimental results have been obtained. The main advantages of our algorithm are that (1) the feature dimension is very low; (2) less samples are needed to train the classifier; (3) finally and most importantly, it is the first attempt to apply the new theory of Hilbert-Huang transform to document analysis and recognition.

[1]  Wei Wang,et al.  Boundary-processing-technique in EMD method and Hilbert transform , 2001 .

[2]  Jean Claude Nunes,et al.  Texture analysis based on local analysis of the Bidimensional Empirical Mode Decomposition , 2005, Machine Vision and Applications.

[3]  Y C Fung,et al.  Engineering analysis of biological variables: an example of blood pressure over 1 day. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Theodosios Pavlidis,et al.  Font recognition and contextual processing for more accurate text recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  WU Li-jun An Approach to Analyze the Period of a Signal Based on HHT and Its Application , 2005 .

[6]  Jonathan W. Essex,et al.  Application of the Hilbert-Huang transform to the analysis of molecular dynamics simulations , 2003 .

[7]  Y C Fung,et al.  Nonlinear indicial response of complex nonstationary oscillations as pulmonary hypertension responding to step hypoxia. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Sargur N. Srihari,et al.  Multifont classification using typographical attributes , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[9]  G. Maderlechner,et al.  Font Style Detection Using Textons , 1998 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Zeng Li Multi-Scale Wavelet Texture-Based Script Identification Method , 2000 .

[12]  Rolf Ingold,et al.  Optical Font Recognition Using Typographical Features , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Ding Xiao Font Recognition of Single Chinese Character Based on Wavelet Feature , 2004 .

[14]  Jean Claude Nunes,et al.  Image analysis by bidimensional empirical mode decomposition , 2003, Image Vis. Comput..

[15]  Robert Cooperman Producing good font attribute determination using error-prone information , 1997, Electronic Imaging.

[16]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[17]  N. Huang,et al.  A new view of nonlinear water waves: the Hilbert spectrum , 1999 .

[18]  Lihua Yang,et al.  A Novel Pitch Period Detection Algorithm Based on Hilbert-Huang Transform , 2004, SINOBIOMETRICS.

[19]  Jean Claude Nunes,et al.  Texture analysis based on the bidimensional empirical mode decomposition with gray-level co-occurrence models , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[20]  Dongxu Qi,et al.  Detection of Spindles in Sleep EEGs Using a Novel Algorithm Based on the Hilbert-Huang Transform , 2006 .

[21]  Jonathan J. Hull,et al.  Font and Function Word Identification in Document Recognition , 1996, Comput. Vis. Image Underst..

[22]  Tieniu Tan,et al.  Font Recognition Based on Global Texture Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[24]  David G. Stork,et al.  Pattern Classification , 1973 .