Randomized Primitives for Big Data Processing
暂无分享,去创建一个
[1] Rosa Meo. Maximum independence and mutual information , 2002, IEEE Trans. Inf. Theory.
[2] Marianne Winslett,et al. Multi-resolution bitmap indexes for scientific data , 2007, TODS.
[3] Bingsheng He,et al. Cache-oblivious nested-loop joins , 2006, CIKM '06.
[4] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..
[5] Rina Panigrahy,et al. Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.
[6] Rasmus Pagh,et al. Better Size Estimation for Sparse Matrix Products , 2010, Algorithmica.
[7] Mark A. Iwen,et al. A note on compressed sensing and the complexity of matrix multiplication , 2009, Inf. Process. Lett..
[8] Jeffrey Scott Vitter,et al. Algorithms and Data Structures for External Memory , 2008, Found. Trends Theor. Comput. Sci..
[9] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[10] Rosa Meo. Theory of dependence values , 2000, TODS.
[11] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[12] David P. Woodruff,et al. Tight bounds for distributed functional monitoring , 2011, STOC '12.
[13] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[14] Alan Siegel,et al. On Universal Classes of Extremely Random Constant-Time Hash Functions , 1995, SIAM J. Comput..
[15] Noga Alon,et al. Finding and counting given length cycles , 1997, Algorithmica.
[16] Christos Faloutsos,et al. V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors , 2012, Proc. VLDB Endow..
[17] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .
[18] Andrew Chi-Chih Yao,et al. Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.
[19] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[20] Florin Rusu,et al. Sketches for size of join estimation , 2008, TODS.
[21] Mikkel Thorup,et al. Bottom-k and priority sampling, set similarity and subset sums with minimal independence , 2013, STOC '13.
[22] Edith Cohen,et al. Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments , 2009, Proc. VLDB Endow..
[23] Mark Braverman,et al. Information Lower Bounds via Self-Reducibility , 2015, Theory of Computing Systems.
[24] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.
[25] Andrea Asperti,et al. A proof of Bertrand's postulate , 2012, J. Formaliz. Reason..
[26] Nikolaj Tatti,et al. Maximum entropy based significance of itemsets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[27] Jilles Vreeken,et al. Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.
[28] Rasmus Pagh,et al. Faster join-projects and sparse matrix multiplications , 2009, ICDT '09.
[29] Ashish Goel,et al. Efficient distributed locality sensitive hashing , 2012, CIKM.
[30] Peter Bro Miltersen,et al. Is linear hashing good? , 1997, STOC '97.
[31] Andrzej Lingas,et al. A Fast Output-Sensitive Algorithm for Boolean Matrix Multiplication , 2011, Algorithmica.
[32] Ping Li,et al. b-Bit Minwise Hashing for Estimating Three-Way Similarities , 2010, NIPS.
[33] David P. Woodruff,et al. Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[34] Yi Wu,et al. Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny) , 2014, TOCT.
[35] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[36] Ke Yi,et al. Beyond simple aggregates: indexing for summary queries , 2011, PODS.
[37] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.
[38] Rasmus Pagh,et al. The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.
[39] Anna Pagh,et al. Uniform Hashing in Constant Time and Optimal Space , 2008, SIAM J. Comput..
[40] Rajeev Motwani,et al. Lower bounds on locality sensitive hashing , 2005, SCG '06.
[41] Yeye He,et al. ClusterJoin: A Similarity Joins Framework using Map-Reduce , 2014, Proc. VLDB Endow..
[42] Riko Jacob,et al. The I/O Complexity of Sparse Matrix Dense Matrix Multiplication , 2010, LATIN.
[43] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[44] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[45] C. SIAMJ.. LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME , 2001 .
[46] Alessandro Panconesi,et al. Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .
[47] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[48] A. Razborov. Communication Complexity , 2011 .
[49] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[50] T. E. Harris,et al. The Theory of Branching Processes. , 1963 .
[51] Amit Chakrabarti,et al. An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance , 2012, SIAM J. Comput..
[52] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.
[53] Srinivasan Parthasarathy,et al. Scalable all-pairs similarity search in metric spaces , 2013, KDD.
[54] Silvio Lattanzi,et al. On compressing social networks , 2009, KDD.
[55] Aravind Srinivasan,et al. Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.
[56] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[57] Ely Porat,et al. Fast set intersection and two-patterns matching , 2009, Theor. Comput. Sci..
[58] Rasmus Pagh,et al. Generating k-Independent Variables in Constant Time , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.
[59] Vijay V. Vazirani,et al. Matching is as easy as matrix inversion , 1987, STOC.
[60] Mikkel Thorup,et al. On the k-Independence Required by Linear Probing and Minwise Independence , 2010, TALG.
[61] Mathias Bæk Tejs Knudsen,et al. Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence , 2015, ESA.
[62] Mikkel Thorup,et al. Tabulation Based 5-Universal Hashing and Linear Probing , 2010, ALENEX.
[63] François Le Gall,et al. Powers of tensors and fast matrix multiplication , 2014, ISSAC.
[64] Nikolaj Tatti,et al. Computational complexity of queries based on itemsets , 2006, Inf. Process. Lett..
[65] T. S. Jayram. Information complexity: a tutorial , 2010, PODS '10.
[66] Rasmus Pagh,et al. The input/output complexity of triangle enumeration , 2013, PODS.
[67] Eli Upfal,et al. Space-round tradeoffs for MapReduce computations , 2011, ICS '12.
[68] Timothy M. Chan. Speeding up the Four Russians Algorithm by About One More Logarithmic Factor , 2015, SODA.
[69] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[70] C. N. Liu,et al. Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.
[71] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[72] Robert S. Boyer,et al. MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.
[73] Desh Ranjan,et al. Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.
[74] Ping Li,et al. One Permutation Hashing for Efficient Search and Learning , 2012, ArXiv.
[75] Rasmus Pagh,et al. Compressed matrix multiplication , 2011, ITCS '12.
[76] Riko Jacob,et al. Fast Output-Sensitive Matrix Multiplication , 2015, ESA.
[77] Toon Calders,et al. Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.
[78] David P. Woodruff,et al. An optimal algorithm for the distinct elements problem , 2010, PODS '10.
[79] Martti Penttonen,et al. A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.
[80] Edith Cohen,et al. Leveraging discarded samples for tighter estimation of multiple-set aggregates , 2009, SIGMETRICS '09.
[81] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[82] Mikkel Thorup,et al. Simple Tabulation, Fast Expanders, Double Tabulation, and High Independence , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[83] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[84] Gerth Stølting Brodal,et al. Cache-Oblivious Algorithms and Data Structures , 2004, SWAT.
[85] A. J. Stothers. On the complexity of matrix multiplication , 2010 .
[86] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[87] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[88] V. Strassen. Gaussian elimination is not optimal , 1969 .
[89] Rasmus Pagh,et al. I/O-Efficient Similarity Join , 2015, ESA.
[90] A. Joffe. On a Set of Almost Deterministic $k$-Independent Random Variables , 1974 .
[91] Divyakant Agrawal,et al. Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.
[92] Ping Li,et al. Theory and applications of b-bit minwise hashing , 2011, Commun. ACM.
[93] Philip Bille,et al. Fast Evaluation of Union-Intersection Expressions , 2007, ISAAC.
[94] Mikkel Thorup,et al. The power of simple tabulation hashing , 2010, STOC.
[95] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[96] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[97] Gero Greiner,et al. Sparse Matrix Computations and their I/O Complexity , 2012 .
[98] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[99] Edith Cohen,et al. Estimating the size of the transitive closure in linear time , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[100] Edith Cohen,et al. Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..
[101] Panos Kalnis,et al. Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.
[102] David P. Woodruff. Optimal space lower bounds for all frequency moments , 2004, SODA '04.
[103] David P. Woodruff,et al. Is min-wise hashing optimal for summarizing set intersection? , 2014, PODS.
[104] Ryan Williams,et al. Finding orthogonal vectors in discrete structures , 2014, SODA.
[105] Mikkel Thorup. Even strongly universal hashing is pretty fast , 2000, SODA '00.
[106] C. Papadimitriou,et al. The complexity of massive data set computations , 2002 .
[107] S. Dongen. Graph clustering by flow simulation , 2000 .
[108] Russ Bubley,et al. Randomized algorithms , 1995, CSUR.
[109] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[110] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[111] Larry Carter,et al. Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..
[112] Anna Pagh,et al. Linear probing with constant independence , 2006, STOC '07.
[113] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[114] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[115] Vijay V. Vazirani,et al. Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.
[116] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.
[117] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.
[118] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[119] Rasmus Pagh,et al. Association Rule Mining using Maximum Entropy , 2015, ArXiv.
[120] Noam Nisan,et al. On Randomized One-round Communication Complexity , 1995, STOC '95.
[121] Srinivasan Parthasarathy,et al. Bayesian Locality Sensitive Hashing for Fast Similarity Search , 2011, Proc. VLDB Endow..