Text Recognition Algorithm Based on Text Features

It is difficult to realize the text watermarking algorithm on natural language, and the format of text watermarking algorithm has poor robustness against format attacks. This paper presents the new text recognition algorithm based on the text feature. The words are segmented and extracted according to the text feature. The feature dimensions are reduced with the technology of LSA and stop-words database. The new similarity method is also defined to determine the threshold in order to detect the watermarking. The experimental results indicate that the proposed algorithm has better operating efficiency and stronger robustness than the previous researches. This algorithm can also handle the text document written in both Chinese and English effectively.

[1]  Steven H. Low,et al.  Copyright protection for the electronic distribution of text documents , 1999, Proc. IEEE.

[2]  Luigia Puccio,et al.  An image adaptive, wavelet-based watermarking of digital images , 2007 .

[3]  Xingming Sun,et al.  Component-based digital watermarking of Chinese texts , 2004, InfoSecu '04.

[4]  H. K. Garg,et al.  Maximum-likelihood detection in DWT domain image watermarking using Laplacian modeling , 2005, IEEE Signal Processing Letters.

[5]  Sergei Nirenburg,et al.  Natural language processing for information assurance and security: an overview and implementations , 2001, NSPW '00.

[6]  Li Wenbo,et al.  A chaos-based robust wavelet-domain watermarking algorithm , 2004 .

[7]  Anwar M. Mirza,et al.  Word length based zero-watermarking algorithm for tamper detection in text documents , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[8]  Ali N. Akansu,et al.  A new method for detection of watermarks in geometrically distorted images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Jin Zhang,et al.  An algorithm for the illegal copying detection of digital documents , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[10]  Anwar M. Mirza,et al.  Content based Zero-Watermarking Algorithm for Authentication of Text Documents , 2010, ArXiv.

[11]  Lawrence O'Gorman,et al.  Electronic Marking and Identification Techniques to Discourage Document Copying , 1995, IEEE J. Sel. Areas Commun..

[12]  Hui Cheng A review of video registration methods for watermark detection in digital cinema applications , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[13]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[14]  Mohammad Pooyan,et al.  Adaptive and Robust Audio watermarking in Wavelet Domain , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).