Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images

This paper proposes an approach based on the statistical modeling and learning of neighboring characters to extract multilingual texts in images. The case of three neighboring characters is represented as the Gaussian mixture model and discriminated from other cases by the corresponding 'pseudo-probability' defined under Bayes framework. Based on this modeling, text extraction is completed through labeling each connected component in the binary image as character or non-character according to its neighbors, where a mathematical morphology based method is introduced to detect and connect the separated parts of each character, and a Voronoi partition based method is advised to establish the neighborhoods of connected components. We further present a discriminative training algorithm based on the maximum-minimum similarity (MMS) criterion to estimate the parameters in the proposed text extraction approach. Experimental results in Chinese and English text extraction demonstrate the effectiveness of our approach trained with the MMS algorithm, which achieved the precision rate of 93.56% and the recall rate of 98.55% for the test data set. In the experiments, we also show that the MMS provides significant improvement of overall performance, compared with influential training criterions of the maximum likelihood (ML) and the maximum classification error (MCE).

[1]  Shih-Fu Chang,et al.  Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  John E. Beasley,et al.  A delaunay triangulation-based heuristic for the euclidean steiner problem , 1994, Networks.

[3]  Hong Yan Fuzzy curve-tracing algorithm , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[5]  Apostolos Antonacopoulos,et al.  Text extraction from Web images based on a split-and-merge segmentation method using colour perception , 2004, ICPR 2004.

[6]  Perry D. Moerland A comparison of mixture models for density estimation , 1999 .

[7]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[9]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[10]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[11]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[12]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[13]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[14]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[15]  Hong Yan,et al.  Convergence condition and efficient implementation of the fuzzy curve-tracing (FCT) algorithm , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Beom-Joon Cho,et al.  Locating characters in scene images using frequency features , 2002, Object recognition supported by user interaction for service robots.