Probabilistic near-duplicate detection using simhash
暂无分享,去创建一个
[1] Benno Stein,et al. Strategies for retrieving plagiarized documents , 2007, SIGIR.
[2] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[3] Peter Wegner,et al. A technique for counting ones in a binary computer , 1960, CACM.
[4] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[5] Hans-Peter Kriegel,et al. Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.
[6] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[7] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[8] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[9] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[10] Mayank Bawa,et al. LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.
[11] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[12] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[13] Dmitri Loguinov,et al. IRLbot: scaling to 6 billion pages and beyond , 2008, WWW.
[14] Jon Louis Bentley,et al. K-d trees for semidynamic point sets , 1990, SCG '90.
[15] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[16] Alan M. Frieze,et al. Min-wise independent permutations (extended abstract) , 1998, STOC '98.
[17] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[18] Marios Hadjieleftheriou,et al. R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.
[19] Shumeet Baluja,et al. Learning "Forgiving" Hash Functions: Algorithms and Large Scale Tests , 2007, IJCAI.
[20] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..