Bottom-k and priority sampling, set similarity and subset sums with minimal independence
暂无分享,去创建一个
[1] Mario Szegedy,et al. The DLT priority sampling is essentially optimal , 2006, STOC '06.
[2] Stefan Savage,et al. Inside the Slammer Worm , 2003, IEEE Secur. Priv..
[3] Carsten Lund,et al. Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.
[4] Mikkel Thorup,et al. Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation , 2012, SIAM J. Comput..
[5] Yossi Matias,et al. Polynomial Hash Functions Are Reliable (Extended Abstract) , 1992, ICALP.
[6] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[7] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[8] Larry Carter,et al. New classes and applications of hash functions , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).
[9] Ely Porat,et al. Exponential Space Improvement for minwise Based Algorithms , 2012, FSTTCS.
[10] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[11] Anna Pagh,et al. Linear probing with constant independence , 2006, STOC '07.
[12] Helen J. Wang,et al. Online aggregation , 1997, SIGMOD '97.
[13] Ely Porat,et al. Sketching Techniques for Collaborative Filtering , 2009, IJCAI.
[14] Mikkel Thorup,et al. Twisted Tabulation Hashing , 2013, SODA.
[15] Martin Dietzfelbinger,et al. Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes , 1996, STACS.
[16] S. Muthukrishnan,et al. Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.
[17] Russ Bubley,et al. Randomized algorithms , 1995, CSUR.
[18] Carsten Lund,et al. Priority sampling for estimation of arbitrary subset sums , 2007, JACM.
[19] Carl-Erik Särndal,et al. Model Assisted Survey Sampling , 1997 .
[20] James N. Rosenau,et al. To learn more , 2004, IEEE Potentials.
[21] Kai-Min Chung,et al. Why simple hash functions work: exploiting the entropy in a data stream , 2008, SODA '08.
[22] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[23] Ely Porat,et al. Fast Pseudo-Random Fingerprints , 2010, ArXiv.
[24] Ely Porat,et al. Even Better Framework for min-wise Based Algorithms , 2011, ArXiv.
[25] Mikkel Thorup,et al. Confidence intervals for priority sampling , 2006, SIGMETRICS '06/Performance '06.
[27] Edith Cohen,et al. Summarizing data using bottom-k sketches , 2007, PODC '07.
[28] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[29] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[30] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[31] Ely Porat,et al. Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems , 2009, SPIRE.
[32] Mark A. McComb. A Practical Guide to Heavy Tails , 2000, Technometrics.
[33] Mikkel Thorup,et al. On the k-Independence Required by Linear Probing and Minwise Independence , 2010, TALG.
[34] Carsten Lund,et al. Charging from sampled network usage , 2001, IMW '01.
[35] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[36] Larry Rudolph,et al. A Complexity Theory of Efficient Parallel Algorithms , 1990, Theor. Comput. Sci..
[37] Grace Hui Yang,et al. Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.
[38] Aravind Srinivasan,et al. Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.
[39] Carsten Lund,et al. Variance optimal sampling based estimation of subset sums , 2008, ArXiv.
[40] Luca Trevisan,et al. Counting Distinct Elements in a Data Stream , 2002, RANDOM.