CentralMatch: A Fast and Accurate Method to Identify Blog-Duplicates
暂无分享,去创建一个
[1] Grace Hui Yang,et al. Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.
[2] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.
[3] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[4] Andreas Paepcke,et al. SpotSigs: Near Duplicate Detection in Web Page Collections , 2007 .
[5] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..
[6] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[7] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[8] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[9] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[10] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[11] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.
[12] David M. Pennock,et al. Analysis of lexical signatures for finding lost or related documents , 2002, SIGIR '02.
[13] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[14] Dan Klein,et al. Evaluating strategies for similarity search on the web , 2002, WWW '02.
[15] Jack G. Conrad,et al. Online duplicate document detection: signature reliability in a dynamic retrieval environment , 2003, CIKM '03.
[16] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[17] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[18] Hector Garcia-Molina,et al. SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.