Leveraging discarded samples for tighter estimation of multiple-set aggregates
暂无分享,去创建一个
[1] B. Rosén. Asymptotic Theory for Successive Sampling with Varying Probabilities Without Replacement, II , 1972 .
[2] Edith Cohen,et al. Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..
[3] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[4] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[5] James Bennett,et al. The Netflix Prize , 2007 .
[6] Carsten Lund,et al. Variance optimal sampling based estimation of subset sums , 2008, ArXiv.
[7] Edith Cohen,et al. Spatially-decaying aggregation over a network , 2007, J. Comput. Syst. Sci..
[8] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[9] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[10] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[11] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[12] Edith Cohen,et al. Tighter estimation using bottom k sketches , 2008, Proc. VLDB Endow..
[13] Edith Cohen,et al. Efficient estimation algorithms for neighborhood variance and other moments , 2004, SODA '04.
[14] Ann Q. Gates,et al. TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .
[15] Phillip B. Gibbons. Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.
[16] Devavrat Shah,et al. Computing separable functions via gossip , 2005, PODC '06.
[17] Edith Cohen,et al. Estimating Aggregates over Multiple Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[18] Peter J. Haas,et al. On synopses for distinct-value estimation under multiset operations , 2007, SIGMOD '07.
[19] H BloomBurton. Space/time trade-offs in hash coding with allowable errors , 1970 .
[20] Carsten Lund,et al. Priority sampling for estimation of arbitrary subset sums , 2007, JACM.
[21] Carl-Erik Särndal,et al. Model Assisted Survey Sampling , 1997 .
[22] EsbjoÈrn Ohlsson. Sequential Poisson Sampling , 1999 .
[23] Paul G. Spirakis,et al. Weighted random sampling with a reservoir , 2006, Inf. Process. Lett..
[24] Theodore Johnson,et al. Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.
[25] Bruce M. Maggs,et al. Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).
[26] Jessica H. Fong,et al. An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..
[27] Mikkel Thorup,et al. On the Variance of Subset Sum Estimation , 2007, ESA.
[28] Edith Cohen,et al. Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).
[29] Mahesh Viswanathan,et al. An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..
[30] Rajeev Motwani,et al. Towards estimation error guarantees for distinct values , 2000, PODS.
[31] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[32] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[33] Edith Cohen,et al. Spatially-decaying aggregation over a network: model and algorithms , 2004, SIGMOD '04.
[34] Cohen Yi-Min Wang Gaurav Suri. When Piecewise Determinism Is Almost TrueEdith , 1995 .
[35] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[36] K. Brewer,et al. SELECTING SEVERAL SAMPLES FROM A SINGLE POPULATION , 1972 .
[37] David Wetherall,et al. A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM.
[38] Srikanta Tirthapura,et al. Estimating simple functions on the union of data streams , 2001, SPAA '01.
[39] Edith Cohen,et al. Maintaining time-decaying stream aggregates , 2006, J. Algorithms.
[40] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[41] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.
[42] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[43] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[44] Jack G. Conrad,et al. Constructing a text corpus for inexact duplicate detection , 2004, SIGIR '04.
[45] J. Hájek,et al. Sampling from a finite population , 1982 .
[46] Daniel Shawcross Wilkerson,et al. Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.
[47] Mario Szegedy,et al. The DLT priority sampling is essentially optimal , 2006, STOC '06.
[48] Kenneth Ward Church,et al. A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations , 2007, CL.
[49] Carsten Lund,et al. Variance optimal sampling based estimation of subset sums , 2008, ArXiv.
[50] AgrawalRakesh,et al. Mining association rules between sets of items in large databases , 1993 .
[51] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[52] Edith Cohen,et al. Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..
[53] B. Rosén. Asymptotic theory for order sampling , 1997 .
[54] Sandhya Dwarkadas,et al. Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.
[55] Haim Kaplan,et al. Randomized incremental constructions of three-dimensional convex hulls and planar voronoi diagrams, and approximate range counting , 2006, SODA '06.
[56] Li Fan,et al. Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.
[57] Xiaohui Yu,et al. Hashed samples: selectivity estimators for set similarity selection queries , 2008, Proc. VLDB Endow..
[58] Edith Cohen,et al. Summarizing data using bottom-k sketches , 2007, PODC '07.
[59] B. Rosén. On sampling with probability proportional to size , 1997 .