暂无分享,去创建一个
[1] Yun Chi,et al. Blog Community Discovery and Evolution Based on Mutual Awareness Expansion , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).
[2] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[3] John Yen,et al. Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.
[4] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.
[5] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[6] Leonidas J. Guibas,et al. A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[7] Trevor Darrell,et al. Approximate Correspondences in High Dimensions , 2006, NIPS.
[8] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..
[9] Sung-Hyuk Cha,et al. On measuring the distance between histograms , 2002, Pattern Recognit..
[10] Jiaheng Lu,et al. Efficient Merging and Filtering Algorithms for Approximate String Searches , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[11] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[12] GhemawatSanjay,et al. The Google file system , 2003 .
[13] Divyakant Agrawal,et al. SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting , 2008, Proc. VLDB Endow..
[14] Ahmed Metwally,et al. Estimating the number of users behind ip addresses for combating abusive traffic , 2011, KDD.
[15] Rob Pike,et al. Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..
[16] Divyakant Agrawal,et al. Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.
[17] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[18] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[19] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.
[20] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[21] Srinivasan Parthasarathy,et al. Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.
[22] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[23] Jimmy J. Lin,et al. Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.
[24] Ranieri Baraglia,et al. Document Similarity Self-Join with MapReduce , 2010, 2010 IEEE International Conference on Data Mining.
[25] Michael Isard,et al. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.
[26] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..
[27] R. Stanley. Enumerative Combinatorics: Volume 1 , 2011 .
[28] Weixiong Zhang,et al. An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[29] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[30] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[31] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[32] Sung-Hyuk Cha. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .
[33] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[34] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.