论文信息 - Pedigree Tracking in the Face of Ancillary Content

Pedigree Tracking in the Face of Ancillary Content

The accurate tracking and retrieval of content pedigree is a quickly growing requirement as our abilities to create information assets increases exponentially. Plagiarism detection, accurate accreditation, and classification tasks all rely on the ability to determine where content is being used and where it originated. We present an approach to document pedigree tracking that is based on an efficient disk-based data structure and the use of two contrasting collections of historical text. These collections enable content of two types (or degrees of importance) to be defined and accounted for when locating documents with overlapping content. This approach is resilient in the face of substantial ancillary content and paraphrasing, two common sources of error in existing content tracking techniques.

Eugene Creswick | Emi Fujioka | Terrance Goan

[1] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[2] Alberto O. Mendelzon,et al. Similarity-based queries , 1995, PODS '95.

[3] D. Lipman,et al. Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[4] Roberto Grossi,et al. The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[5] D. Eppstein,et al. Efficient Algorithms for Sequence Analysis , 1993 .

[6] Sven Meyer,et al. The Suffix Tree Document Model Revisited , 1992 .

[7] M. Crochemore,et al. On-line construction of suffix trees , 2002 .

[8] Benno Stein,et al. Fuzzy-Fingerprints for Text-Based Information Retrieval , 2005 .

[9] F. E. Grubbs. Procedures for Detecting Outlying Observations in Samples , 1969 .

[10] W. Bruce Croft,et al. Similarity measures for tracking information flow , 2005, CIKM '05.

[11] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.