Sampling algorithms for evolving datasets
暂无分享,去创建一个
[1] P. Haas,et al. Estimating the Number of Classes in a Finite Population , 1998 .
[2] Peter J. Haas,et al. A bi-level Bernoulli scheme for database sampling , 2004, SIGMOD '04.
[3] S. Muthukrishnan,et al. Data streams: algorithms and applications , 2005, SODA '03.
[4] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.
[5] Anany Levitin,et al. Introduction to the Design and Analysis of Algorithms , 2002 .
[6] Brian A. Carter,et al. Advanced Encryption Standard , 2007 .
[7] Takuji Nishimura,et al. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.
[8] Paul Brown,et al. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data , 2003, VLDB.
[9] Surajit Chaudhuri,et al. A robust, optimization-based approach for approximate answering of aggregate queries , 2001, SIGMOD '01.
[10] Yossi Matias,et al. DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .
[11] Torsten Suel,et al. Optimal Histograms with Quality Guarantees , 1998, VLDB.
[12] Rajeev Motwani,et al. Sampling from a moving window over streaming data , 2002, SODA '02.
[13] Bin Chen,et al. Efficient Data-Reduction Methods for On-line Association Rule Discovery , 2004 .
[14] Christos Faloutsos,et al. Density biased sampling: an improved method for data mining and clustering , 2000, SIGMOD '00.
[15] Yossi Matias,et al. Fast incremental maintenance of approximate histograms , 1997, TODS.
[16] Carsten Lund,et al. Charging from sampled network usage , 2001, IMW '01.
[17] K. Aiyappan Nair. An Improved Algorithm for Ordered Sequential Random Sampling , 1990, TOMS.
[18] Calisto Zuzarte,et al. Query sampling in DB2 Universal Database , 2004, SIGMOD '04.
[19] Surajit Chaudhuri,et al. Effective use of block-level sampling in statistics estimation , 2004, SIGMOD '04.
[20] Heikki Mannila,et al. The power of sampling in knowledge discovery , 1994, PODS '94.
[21] P. Haas. Speeding up DB 2 UDB Using Sampling , 2003 .
[22] Philippe Flajolet,et al. Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..
[23] A. I. McLeod,et al. A Convenient Algorithm for Drawing a Simple Random Sample , 1983 .
[24] Peter J. Haas,et al. Maintaining bernoulli samples over evolving multisets , 2007, PODS '07.
[25] Wen-Chi Hou,et al. Statistical estimators for relational algebra expressions , 1988, PODS '88.
[26] Theodore Johnson,et al. Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.
[27] S. B. Yao,et al. Approximating block accesses in database organizations , 1977, CACM.
[28] Cecilia R. Aragon,et al. Randomized search trees , 2005, Algorithmica.
[29] Paul Brown,et al. CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.
[30] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[31] Mervin E. Muller,et al. Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers , 1962 .
[32] Phillip B. Gibbons. Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.
[33] Peter J. Haas,et al. Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.
[34] Chris Jermaine,et al. Robust Estimation With Sampling and Approximate Pre-Aggregation , 2003, VLDB.
[35] Rajeev Motwani,et al. Computing Iceberg Queries Efficiently , 1998, VLDB.
[36] Peter J. Haas,et al. On synopses for distinct-value estimation under multiset operations , 2007, SIGMOD '07.
[37] David J. DeWitt,et al. Practical Skew Handling in Parallel Joins , 1992, VLDB.
[38] Peter J Haas,et al. An Estimator of Number of Species from Quadrat Sampling , 2006, Biometrics.
[39] Sridhar Ramaswamy,et al. Join synopses for approximate query answering , 1999, SIGMOD '99.
[40] Rajeev Motwani,et al. On Sampling and Relational Operators , 1999, IEEE Data Eng. Bull..
[41] Ing Rj Ser. Approximation Theorems of Mathematical Statistics , 1980 .
[42] Luca Trevisan,et al. Counting Distinct Elements in a Data Stream , 2002, RANDOM.
[43] Peter J. Haas,et al. Ripple joins for online aggregation , 1999, SIGMOD '99.
[44] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[45] Jeffrey F. Naughton,et al. On the relative cost of sampling for join selectivity estimation , 1994, PODS '94.
[46] Mikkel Thorup,et al. Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.
[47] Theodore Johnson,et al. Sampling algorithms in a stream operator , 2005, SIGMOD '05.
[48] Wolfgang Lehner,et al. Linked Bernoulli Synopses: Sampling along Foreign Keys , 2008, SSDBM.
[49] Peter J. Haas,et al. Techniques for Warehousing of Sample Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[50] Wolfgang Lehner,et al. Designing Random Sample Synopses with Outliers , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[51] L. Devroye. Non-Uniform Random Variate Generation , 1986 .
[52] Jeffrey Scott Vitter,et al. Faster methods for random sampling , 1984, CACM.
[53] Gregory Piatetsky-Shapiro,et al. Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.
[54] Helen J. Wang,et al. Online aggregation , 1997, SIGMOD '97.
[55] David J. DeWitt,et al. Parallel sorting on a shared-nothing architecture using probabilistic splitting , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.
[56] Wen-Chi Hou,et al. Statistical estimators for aggregate relational algebra queries , 1991, TODS.
[57] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.
[58] Voratas Kachitvichyanukul,et al. Computer generation of hypergeometric random variates , 1985 .
[59] Olli Nevalainen,et al. Two efficient algorithms for random sampling without replacement , 1982 .
[60] Peter J. Haas,et al. A dip in the reservoir: maintaining sample synopses of evolving datasets , 2006, VLDB.
[61] S. Seshadri. Probabilistic methods in query processing , 1992 .
[62] Jeffrey F. Naughton,et al. Query size estimation by adaptive sampling (extended abstract) , 1990, PODS.
[63] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[64] Bin Chen,et al. A new two-phase sampling based algorithm for discovering association rules , 2002, KDD.
[65] Yannis E. Ioannidis,et al. The History of Histograms (abridged) , 2003, VLDB.
[66] Viswanath Poosala,et al. Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.
[67] Joachim H. Ahrens,et al. Sequential random sampling , 1985, TOMS.
[68] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.
[69] M. H. Hansen,et al. On the Theory of Sampling from Finite Populations , 1943 .
[70] Frank Olken,et al. Random Sampling from Databases , 1993 .
[71] Suman Nath,et al. Online maintenance of very large random samples on flash storage , 2009, The VLDB Journal.
[72] Dan E. Willard,et al. Optimal sample cost residues for differential database batch query problems , 1991, JACM.
[73] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[74] Kurzfassung der Dissertation. Sampling Algorithms for Evolving Datasets , 2008 .
[75] A. C. Bebbington,et al. A Simple Method of Drawing a Sample Without Replacement , 1975 .
[76] Surajit Chaudhuri,et al. Optimized stratified sampling for approximate query processing , 2007, TODS.
[77] Peter J. Haas,et al. Hoeffding inequalities for join-selectivity estimation and online aggregation , 1996 .
[78] Surajit Chaudhuri,et al. Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.
[79] Jeffrey F. Naughton,et al. Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.
[80] Stavros Christodoulakis,et al. Estimating block transfers and join sizes , 1983, SIGMOD '83.
[81] Chris Jermaine,et al. Maintaining very large random samples using the geometric file , 2008, The VLDB Journal.
[82] Graham Cormode,et al. Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling , 2005, VLDB.
[83] Mervin E. Muller. The use of computers in inspection procedures , 1958, CACM.
[84] Rajeev Motwani,et al. Towards estimation error guarantees for distinct values , 2000, PODS.
[85] Charu C. Aggarwal,et al. On biased reservoir sampling in the presence of stream evolution , 2006, VLDB.
[86] Doron Rotem,et al. Random Sampling from B+ Trees , 1989, VLDB.
[87] Jeffrey F. Naughton,et al. Selectivity and Cost Estimation for Joins Based on Random Sampling , 1996, J. Comput. Syst. Sci..
[88] Chris Jermaine,et al. A disk-based join with probabilistic guarantees , 2005, SIGMOD '05.
[89] Rajeev Motwani,et al. Random sampling for histogram construction: how much is enough? , 1998, SIGMOD '98.
[90] R. Agarwal. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.
[91] Rajeev Motwani,et al. Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.
[92] Larry Carter,et al. Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.
[93] William H. Press,et al. Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .
[94] T. Shinozaki,et al. Constructing an Optimal Family of Min-Wise Independent Permutations , 2000 .
[95] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[96] David J. DeWitt,et al. An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.
[97] Piotr Indyk,et al. A small approximately min-wise independent family of hash functions , 1999, SODA '99.
[98] Chris Jermaine,et al. Sampling-based estimators for subset-based queries , 2008, The VLDB Journal.
[99] Yossi Matias,et al. New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.
[100] Edith Cohen,et al. Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..
[101] Usama M. Fayyad,et al. Knowledge Discovery in Databases: An Overview , 1997, ILP.
[102] P.J. Haas,et al. Sampling-based selectivity estimation for joins using augmented frequent value statistics , 1995, Proceedings of the Eleventh International Conference on Data Engineering.
[103] D. DeWitt,et al. Equi-depth multidimensional histograms , 1988, SIGMOD '88.
[104] Chris Jermaine,et al. Scalable approximate query processing with the DBO engine , 2008, TODS.
[105] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[106] Russ Bubley,et al. Randomized algorithms , 1995, CSUR.
[107] S. Muthukrishnan,et al. Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.
[108] Rajeev Rastogi,et al. Data Stream Management: Processing High-Speed Data Streams (Data-Centric Systems and Applications) , 2019 .
[109] Dorothy E. Denning,et al. Secure statistical databases with random sample queries , 1980, TODS.
[110] Charu C. Aggarwal,et al. Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .
[111] Ashish Gupta,et al. Materialized views: techniques, implementations, and applications , 1999 .
[112] Ruoming Jin,et al. New Sampling-Based Estimators for OLAP Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[113] Pierre L'Ecuyer,et al. Uniform random number generation , 1994, Ann. Oper. Res..
[114] F. Olken,et al. Maintenance of materialized views of sampling queries , 1992, [1992] Eighth International Conference on Data Engineering.
[115] Fei Xu,et al. Confidence bounds for sampling-based group by estimates , 2008, TODS.
[116] Wolfgang Lehner,et al. Sampling time-based sliding windows in bounded space , 2008, SIGMOD Conference.
[117] Muhammad HanifI,et al. Sampling with Unequal Probabilities without Replacement: A Review , 1980 .
[118] R. S. Pinkham. An Efficient Algorithm for Drawing a Simple Random Sample , 1987 .
[119] Peter J. Haas,et al. Maintaining bounded-size sample synopses of evolving datasets , 2008, The VLDB Journal.
[120] Peter J. Haas,et al. Large-sample and deterministic confidence intervals for online aggregation , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).
[121] Jeffrey Scott Vitter,et al. An efficient algorithm for sequential random sampling , 1987, TOMS.
[122] J. Bunge,et al. Estimating the Number of Species: A Review , 1993 .
[123] Stefan Berchtold,et al. An efficient approximation scheme for data mining tasks , 2001, Proceedings 17th International Conference on Data Engineering.
[124] Yossi Matias,et al. Bifocal sampling for skew-resistant join size estimation , 1996, SIGMOD '96.
[125] M. Grossglauser,et al. Trajectory sampling for direct traffic observation , 2000 .
[126] Gennady Antoshenkov,et al. Random Sampling from Pseudo-Ranked B+ Trees , 1992, VLDB.
[127] Wolfgang Lehner,et al. Cardinality estimation using sample views with quality assurance , 2007, SIGMOD '07.
[128] Jeffrey F. Naughton,et al. Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.
[129] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .
[130] Sumit Ganguly,et al. Counting distinct items over update streams , 2005, Theor. Comput. Sci..
[131] Mikkel Thorup. Even strongly universal hashing is pretty fast , 2000, SODA '00.
[132] Armido R. Didonato,et al. Algorithm 708: Significant digit computation of the incomplete beta function ratios , 1988, TOMS.
[133] Wei Sun,et al. An evaluation of sampling-based size estimation methods for selections in database systems , 1995, Proceedings of the Eleventh International Conference on Data Engineering.
[134] Bin Chen,et al. Efficient data reduction with EASE , 2003, KDD '03.
[135] Lukasz Golab,et al. Issues in data stream management , 2003, SGMD.
[136] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[137] Aristides Gionis,et al. Clustering Aggregation , 2005, ICDE.
[138] Terence G. Jones,et al. A note on sampling a tape-file , 1962, Commun. ACM.
[139] Noga Alon,et al. Tracking join and self-join sizes in limited storage , 1999, PODS '99.
[140] To-Yat Cheung. Estimating block accesses and number of records in file management , 1982, CACM.
[141] Carl-Erik Särndal,et al. Model Assisted Survey Sampling , 1997 .
[142] Mong-Li Lee,et al. ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.
[143] Kai-Min Chung,et al. Why simple hash functions work: exploiting the entropy in a data stream , 2008, SODA '08.
[144] A. Bissell. Ordered Random Selection without Replacement , 1986 .
[145] Wen-Chi Hou,et al. Error-constrained COUNT query evaluation in relational databases , 1991, SIGMOD '91.
[146] Hannu Toivonen,et al. Sampling Large Databases for Association Rules , 1996, VLDB.
[147] Peter Hellekalek,et al. Empirical evidence concerning AES , 2003, TOMC.
[148] Jeffrey F. Naughton,et al. Synopses for query optimization: A space-complexity perspective , 2004, TODS.
[149] Yufei Tao,et al. Random Sampling for Continuous Streams with Arbitrary Updates , 2007 .
[150] Piotr Indyk,et al. Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..
[151] Xiaohui Yu,et al. Hashed samples: selectivity estimators for set similarity selection queries , 2008, Proc. VLDB Endow..
[152] J. Rao. On the Comparison of Sampling with and without Replacement , 1966 .
[153] Michael M. Strand. Estimation of a Population Total under a “Bernoulli Sampling” Procedure , 1979 .
[154] Peter J. Haas,et al. Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.
[155] C. Pipper,et al. [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.
[156] Chris Jermaine,et al. Online maintenance of very large random samples , 2004, SIGMOD '04.
[157] Wolfgang Lehner,et al. Deferred Maintenance of Disk-Based Random Samples , 2006, EDBT.
[158] Ping Xu,et al. Random sampling from hash files , 1990, SIGMOD '90.
[159] Olli Nevalainen,et al. An Algorithm for Unbiased Random Sampling , 1982, Comput. J..
[160] Doron Rotem,et al. Simple Random Sampling from Relational Databases , 1986, VLDB.
[161] Kim-Hung Li,et al. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))) , 1994, TOMS.
[162] Jennifer Widom,et al. The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.
[163] Jeffrey F. Naughton,et al. End-biased Samples for Join Cardinality Estimation , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[164] Farshad Fotouhi,et al. Computation of partial query results with an adaptive stratified sampling technique , 1995, CIKM '95.
[165] Alfonso F. Cardenas. Analysis and performance of inverted data base structures , 1975, CACM.
[166] Srikanta Tirthapura,et al. Estimating simple functions on the union of data streams , 2001, SPAA '01.
[167] Jeffrey F. Naughton,et al. Fixed-precision estimation of join selectivity , 1993, PODS '93.