Data Mining and Analysis: Fundamental Concepts and Algorithms

The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features: Covers both core methods and cutting-edge research Algorithmic approach with open-source implementations Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference Supplementary website with lecture slides, videos, project ideas, and more

[1]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[2]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[3]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[4]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[6]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[7]  Béla Bollobás,et al.  Random Graphs , 1985 .

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[10]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[13]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[14]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[15]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[16]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[19]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[22]  Christos Faloutsos,et al.  Graph Mining: Laws, Tools, and Case Studies , 2012, Synthesis Lectures on Data Mining and Knowledge Discovery.

[23]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[25]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[26]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[29]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[30]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[36]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[37]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[38]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[39]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[40]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[41]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[42]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[43]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[44]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[45]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[46]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[47]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[48]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[49]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[50]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[51]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[52]  Chris H. Q. Ding,et al.  Network community discovery: solving modularity clustering via normalized cut , 2010, MLG '10.

[53]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[54]  Geoffrey I. Webb Discovering significant rules , 2006, KDD '06.

[55]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[56]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[57]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[58]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[59]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[60]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[61]  Nimrod Megiddo,et al.  Discovering Predictive Association Rules , 1998, KDD.

[62]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[63]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[64]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[65]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[66]  Olivier Teytaud,et al.  Association Rule Interestingness: Measure and Statistical Validation , 2007, Quality Measures in Data Mining.

[67]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[68]  Michael Luxenburger,et al.  Implications partielles dans un contexte , 1991 .

[69]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[70]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[71]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[72]  I. Jolliffe Principal Component Analysis , 2002 .

[73]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[74]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[75]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[76]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[77]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[78]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[79]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[80]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[81]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[82]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[83]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[84]  Jeffrey S. Rosenthal,et al.  Probability and Statistics: The Science of Uncertainty , 2003 .

[85]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[86]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[87]  Shai Ben-David,et al.  Clusterability: A Theoretical Study , 2009, AISTATS.

[88]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[89]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[90]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[91]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[93]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[94]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[95]  Harry Zhang,et al.  Exploring Conditions For The Optimality Of Naïve Bayes , 2005, Int. J. Pattern Recognit. Artif. Intell..

[96]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[97]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[98]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[99]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[100]  David Poole,et al.  Linear Algebra: A Modern Introduction , 2002 .

[101]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[102]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[103]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[104]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[105]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[106]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[107]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[108]  Ted G. Lewis,et al.  Network Science: Theory and Applications , 2009 .

[109]  S. Dongen Graph clustering by flow simulation , 2000 .

[110]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[111]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[112]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[113]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[114]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[115]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[116]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[117]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[118]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[119]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[120]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[121]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[122]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[123]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[124]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[125]  A. C. Rencher Methods of multivariate analysis , 1995 .

[126]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[127]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[128]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[129]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[130]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[131]  Maurice G. Kendall,et al.  A Course in the Geometry of n Dimensions , 1962 .

[132]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[133]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[134]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[135]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .