Integrated Segmentation and Recognition of Mixed Chinese/English Document

This paper presents a general frame to integrate segmentation and recognition and gives a novel method to identify lingual attribute of mixed Chinese/English characters. The outstanding performance of this method is as follows. First, a text- line rather than a character segment is regarded as a process unit. Second, multi-feature is adopted based on multi-phase segmentation. Third, two types of feedbacks, including from character recognition and from character feature statistic within a text-line, are adopted throughout the whole segmentation and recognition. Fourth, it is adaptive to the quality and genre of documents.

[1]  Nei Kato,et al.  A Handwritten Character Recognition System Using Directional Element Feature and Asymmetric Mahalanobis Distance , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jian-Ming Jin,et al.  Mixed Chinese/English document auto-processing based on the periodicity , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[3]  Wang Chun Segmentation of Mixed Chinese/English Document Based on AFMPF Model , 2006 .

[4]  Wang Kai,et al.  Research on Chinese/English Mixed Document Recognition , 2005 .

[5]  Xiaoqing Ding,et al.  A general framework for multicharacter segmentation and its application in recognizing multilingual Asian documents , 2003, IS&T/SPIE Electronic Imaging.

[6]  Hong Guo,et al.  Realization of a high-performance bilingual Chinese-English OCR system , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Dae-Geun Jang,et al.  Segmentation of a text printed in Korean and English using structure information and character recognizers , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[8]  Qiang Huo,et al.  Confidence guided progressive search and fast match techniques for high performance Chinese/English OCR , 2002, Object recognition supported by user interaction for service robots.

[9]  Chunheng Wang,et al.  Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern , 2006 .

[10]  Jhing-Fa Wang,et al.  A new method for the segmentation of mixed handprinted Chinese/English characters , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).