A Character Segmentation Method without Character Verification

Nowadays many digital watermarking schemes have been proposed to protect paper-based documents, in which character segmentation is important in both embedding and detecting processes. However, considering the time/cost consuming, the current character segmentation methods used in OCR (optical character recognition) are not suitable for this purpose. In this paper, by incorporating the statistical structural data and the periodicity, a method to segment the mixed Chinese/English characters without OCR phase is proposed. Experimental results show that the novel method can implement character segmentation and language discrimination effectively and it will improve the performance of the digital watermarking schemes designed for paper-based documents.

[1]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[2]  Steven H. Low,et al.  Copyright protection for the electronic distribution of text documents , 1999, Proc. IEEE.

[3]  Paulo Vinicius Koerich Borges,et al.  Text luminance modulation for hardcopy watermarking , 2007, Signal Process..

[4]  Wang Kai,et al.  Research on Chinese/English Mixed Document Recognition , 2005 .

[5]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Qiang Huo,et al.  Confidence guided progressive search and fast match techniques for high performance Chinese/English OCR , 2002, Object recognition supported by user interaction for service robots.

[7]  Hong Guo,et al.  Realization of a high-performance bilingual Chinese-English OCR system , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Jian-Ming Jin,et al.  Mixed Chinese/English document auto-processing based on the periodicity , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).