Mining of Massive Datasets
暂无分享,去创建一个
[1] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.
[2] Chris Anderson,et al. The Long Tail: Why the Future of Business is Selling Less of More , 2006 .
[3] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .
[4] Luis Mateus Rocha,et al. Singular value decomposition and principal component analysis , 2003 .
[5] Philippe Flajolet,et al. Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).
[6] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..
[7] Paul S. Bradley,et al. Scaling Clustering Algorithms to Large Databases , 1998, KDD.
[8] Jennifer Widom,et al. A First Course in Database Systems , 1997 .
[9] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[10] Jeffrey D. Ullman,et al. Transitive closure and recursive Datalog implemented on clusters , 2012, EDBT '12.
[11] Petros Drineas,et al. Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.
[12] Petros Drineas,et al. FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .
[13] Hannu Toivonen,et al. Sampling Large Databases for Association Rules , 1996, VLDB.
[14] Taher H. Haveliwala. Efficient Computation of PageRank , 1999 .
[15] Andreas Paepcke,et al. SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.
[16] Richard C. Singleton,et al. Nonrandom binary superimposed codes , 1964, IEEE Trans. Inf. Theory.
[17] Jure Leskovec,et al. Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.
[18] Phillip B. Gibbons. Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.
[19] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.
[20] Abraham Silberschatz,et al. View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.
[21] Andrei Z. Broder,et al. Graph structure in the Web , 2000, Comput. Networks.
[22] Gordon S. Blair,et al. A generic component model for building systems software , 2008, TOCS.
[23] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[24] Bala Kalyanasundaram,et al. An optimal deterministic algorithm for online b-matching , 1996, Theor. Comput. Sci..
[25] Jeffrey D. Ullman,et al. Optimizing joins in a map-reduce environment , 2010, EDBT '10.
[26] Jeffrey Xu Yu,et al. Efficient similarity joins for near duplicate detection , 2008, WWW.
[27] Marco Rosa,et al. HyperANF: approximating the neighbourhood function of very large graphs on a budget , 2010, WWW.
[28] Jennifer Widom,et al. Database Systems: The Complete Book , 2001 .
[29] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.
[30] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[31] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[32] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[33] Luis von Ahn. Games with a Purpose , 2006, Computer.
[34] M E J Newman,et al. Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.
[35] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.
[36] Robert J. Kauffman,et al. Understanding evolution in technology ecosystems , 2008, Commun. ACM.
[37] Yannis E. Ioannidis,et al. On the Computation of the Transitive Closure of Relational Operators , 1986, VLDB.
[38] Ravi Kumar,et al. Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.
[39] Nick Craswell,et al. An experimental comparison of click position-bias models , 2008, WSDM '08.
[40] Sergei Vassilvitskii,et al. Counting triangles and the curse of the last reducer , 2011, WWW.
[41] Gene H. Golub,et al. Matrix computations , 1983 .
[42] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..
[43] Hector Garcia-Molina,et al. Link spam detection based on mass estimation , 2006, VLDB.
[44] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..
[45] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[46] Hans-Arno Jacobsen,et al. PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..
[47] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.
[48] Jeffrey D. Ullman,et al. A New Computation Model for Cluster Computing , 2009 .
[49] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[50] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[51] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.
[52] Michael Isard,et al. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.
[53] David J. DeWitt,et al. Clustera: an integrated computation and data management system , 2008, Proc. VLDB Endow..
[54] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.
[55] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[56] Erhard Rahm,et al. Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.
[57] Jon Kleinberg,et al. Authoritative sources in a hyperlinked environment , 1999, SODA '98.
[58] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[59] Rajeev Motwani,et al. Computing Iceberg Queries Efficiently , 1998, VLDB.
[60] Mohamed Medhat Gaber,et al. Scientific Data Mining and Knowledge Discovery - Principles and Foundations , 2009 .
[61] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.
[62] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.
[63] Christopher J. C. Burges,et al. A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.
[64] Christos Faloutsos,et al. Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).
[65] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..
[66] Aranyak Mehta,et al. AdWords and Generalized On-line Matching , 2005, FOCS.
[67] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[68] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[69] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..
[70] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[71] AhnLuis von. Games with a Purpose , 2006 .
[72] James C. French,et al. Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[73] Jennifer Widom,et al. Models and issues in data stream systems , 2002, PODS.
[74] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[75] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[76] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[77] Rajeev Motwani,et al. Maintaining variance and k-medians over data stream windows , 2003, PODS.
[78] Jimeng Sun,et al. Less is More: Compact Matrix Decomposition for Large Sparse Graphs , 2007, SDM.
[79] Jitendra Malik,et al. Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[80] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[81] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[82] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[83] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[84] Santo Fortunato,et al. Community detection in graphs , 2009, ArXiv.
[85] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[86] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[87] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[88] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.
[89] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[90] Christos Faloutsos,et al. DOULION: counting triangles in massive graphs with a coin , 2009, KDD.
[91] R. Merton. The Matthew Effect in Science , 1968, Science.
[92] Jeffrey D. Ullman,et al. Cluster Computing, Recursion and Datalog , 2010, Datalog.
[93] Patrick Valduriez,et al. Evaluation of Recursive Queries Using Join Indices , 1986, Expert Database Conf..
[94] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[95] Yehuda Koren,et al. The BellKor Solution to the Netflix Grand Prize , 2009 .
[96] Avrim Blum,et al. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.
[97] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.
[98] Piotr Indyk,et al. Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..
[99] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.
[100] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[101] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.
[102] Christos Faloutsos,et al. ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.
[103] Greg Linden,et al. Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .
[104] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).