Managing déjà vu: Collection building for the identification of nonidentical duplicate documents
暂无分享,去创建一个
[1] Stephen E. Robertson,et al. Building a filtering test collection for TREC 2002 , 2003, SIGIR.
[2] Yi Zhang,et al. Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.
[3] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[4] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[5] Jack G. Conrad,et al. Online duplicate document detection: signature reliability in a dynamic retrieval environment , 2003, CIKM '03.
[6] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[7] S. Siegel,et al. Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.
[8] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..
[9] Carmen Miller. Detecting duplicates: a searcher's dream come true , 1990 .
[10] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.
[11] Peter Bailey,et al. Overview of the TREC-8 Web Track , 2000, TREC.
[12] David M. Pennock,et al. Analysis of lexical signatures for finding lost or related documents , 2002, SIGIR '02.
[13] Chris Buckley,et al. OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.
[14] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[15] Peter Schäuble,et al. Building a Large Multilingual Test Collection from Comparable News Documents , 1998 .
[16] James W. Cooper,et al. Detecting similar documents using salient terms , 2002, CIKM '02.
[17] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[18] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[19] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[20] Helen R. Tibbo,et al. The Cystic Fibrosis Database: Content and Research Opportunities. , 1991 .
[21] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..
[22] Robert Burgin. Variations in Relevance Judgments and the Evaluation of Retrieval Performance , 1992, Inf. Process. Manag..
[23] Ellen M. Voorhees,et al. Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.
[24] Mark Stevenson,et al. The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.
[25] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[26] Donna K. Harman,et al. Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..
[27] Stephen P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness , 1996 .
[28] Jeannette M. Wing,et al. Model checking electronic commerce protocols , 1996 .
[29] Peter Jackson,et al. Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .
[30] Peter Jackson,et al. Briefly noted: natural language processing for online applications: Text retrieval, extraction, and categorization , 2003 .
[31] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[32] K. Sparck Jones,et al. INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .
[33] Stephen E. Robertson,et al. On relevance weights with little relevance information , 1997, SIGIR '97.
[34] Howard R. Turtle. Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.
[35] M. Sanderson,et al. Duplicate Detection in the Reuters Collection , 1997 .
[36] Daniel Marcu,et al. The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.
[37] Andrew Levison. Ziff-Davis: sale of publishing giant impacts online industry , 1994 .
[38] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[39] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.