Managing déjà vu: Collection building for the identification of nonidentical duplicate documents
暂无分享,去创建一个
[1] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..
[2] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[3] Stephen P. Harter,et al. Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness , 1996, J. Am. Soc. Inf. Sci..
[4] Stephen E. Robertson,et al. Building a filtering test collection for TREC 2002 , 2003, SIGIR.
[5] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[6] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.
[7] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[8] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[9] Bernice W. Polemis. Nonparametric Statistics for the Behavioral Sciences , 1959 .
[10] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[11] Mark Stevenson,et al. The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.
[12] K. Sparck Jones,et al. INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .
[13] Cyril W. Cleverdon. The effect of variations in relevance assessments in comparative experimental tests of index languages , 1970 .
[14] Carol Tenopir,et al. TARGET and FREESTYLE: DIALOG and Mead join the relevance ranks , 1997 .
[15] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[16] Peter Jackson,et al. Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .
[17] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.
[18] Helen R. Tibbo,et al. The Cystic Fibrosis Database: Content and Research Opportunities. , 1991 .
[19] Ellen M. Voorhees,et al. Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.
[20] Carmen Miller. Detecting duplicates: a searcher's dream come true , 1990 .
[21] Peter Schäuble,et al. Building a Large Multilingual Test Collection from Comparable News Documents , 1998 .
[22] James W. Cooper,et al. Detecting similar documents using salient terms , 2002, CIKM '02.
[23] Peter Bailey,et al. Overview of the TREC-8 Web Track , 2000, TREC.
[24] Daniel Marcu,et al. The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.
[25] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[26] Jack G. Conrad,et al. Online duplicate document detection: signature reliability in a dynamic retrieval environment , 2003, CIKM '03.
[27] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[28] Yi Zhang,et al. Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.
[29] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[30] Donna K. Harman,et al. Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..
[31] Howard R. Turtle. Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.
[32] M. Sanderson,et al. Duplicate Detection in the Reuters Collection , 1997 .
[33] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..
[34] Robert Burgin. Variations in Relevance Judgments and the Evaluation of Retrieval Performance , 1992, Inf. Process. Manag..
[35] David M. Pennock,et al. Analysis of lexical signatures for finding lost or related documents , 2002, SIGIR '02.
[36] Chris Buckley,et al. OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.
[37] Tefko Saracevic. Users lost: reflections on the past, future, and limits of information science , 1997, SIGF.