Redundant documents and search effectiveness
暂无分享,去创建一个
[1] Steven Garcia,et al. Access-Ordered Indexes , 2004, ACSC.
[2] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..
[3] Ian H. Witten,et al. Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .
[4] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[5] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.
[6] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[7] Donna K. Harman,et al. Overview of the TREC 2003 Novelty Track , 2003, TREC.
[8] James Allan,et al. Retrieval and novelty detection at the sentence level , 2003, SIGIR.
[9] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.
[10] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[11] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[12] Justin Zobel,et al. A Scalable System for Identifying Co-derivative Documents , 2004, SPIRE.
[13] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[14] Charles L. A. Clarke,et al. Overview of the TREC 2004 Terabyte Track , 2004, TREC.
[15] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[16] Ronald L. Rivest,et al. The MD5 Message-Digest Algorithm , 1992, RFC.
[17] Donna K. Harman,et al. Overview of the TREC 2002 Novelty Track , 2002, TREC.
[18] Yi Zhang,et al. Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.
[19] Hector Garcia-Molina,et al. SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.
[20] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[21] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[22] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.
[23] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.
[24] Ellen M. Voorhees,et al. Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.
[25] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[26] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[27] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD 2000.
[28] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.