Data mining: practical machine learning tools and techniques, 3rd Edition

Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[5]  A. Koestler The Act of Creation , 1964 .

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  William Mendenhall,et al.  Introduction to Probability and Statistics , 1961, The Mathematical Gazette.

[8]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[9]  Second Edition,et al.  Statistical Package for the Social Sciences , 1970 .

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[14]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[17]  E. Asmis Epicurus' Scientific Method , 1988 .

[18]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[19]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[20]  Nancy Martin,et al.  Programming Expert Systems in OPS5 - An Introduction to Rule-Based Programming(1) , 1985, Int. CMG Conference.

[21]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[22]  J R Beck,et al.  The use of relative operating characteristic (ROC) curves in test performance evaluation. , 1986, Archives of pathology & laboratory medicine.

[23]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[24]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[25]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[26]  David W. Aha,et al.  Learning Representative Exemplars of Concepts: An Initial Case Study , 1987 .

[27]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[28]  K. Jabbour,et al.  ALFA: automated load forecasting assistant , 1988 .

[29]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[30]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[31]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[32]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[33]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[34]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[35]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[36]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[37]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[38]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[39]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[40]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[41]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[42]  Ming Li,et al.  Inductive Reasoning and Kolmogorov Complexity , 1992, J. Comput. Syst. Sci..

[43]  Thomas G. Dietterich,et al.  Efficient Algorithms for Identifying Relevant Features , 1992 .

[44]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[45]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[46]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[47]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[48]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[49]  Henry Lieberman,et al.  Watch what I do: programming by demonstration , 1993 .

[50]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[51]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[52]  Dee Andy Michel,et al.  Library research models: A guide to classification, cataloging and computers: Mann, Thomas. New York: Oxford University Press, 1993. 248 pp. $22.50 (ISBN: 0-19-5-8190-0). , 1994 .

[53]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[54]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[55]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[56]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[57]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[58]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[59]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[60]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[61]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[62]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[63]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[64]  R. Bouckaert Bayesian belief networks : from construction to inference , 1995 .

[65]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[66]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[67]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[68]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[69]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[70]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[71]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[72]  Geoffrey Holmes,et al.  Feature selection via the discovery of simple classification rules , 1995 .

[73]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[74]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[75]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[76]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[77]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[78]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[79]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[80]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[81]  Francesco Bergadano,et al.  Inductive Logic Programming: From Machine Learning to Software Engineering , 1995 .

[82]  Brent Martin,et al.  INSTANCE-B ASED LEARNING: Nearest Neighbour with Generalisation , 1995 .

[83]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[84]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[85]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[86]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[87]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[88]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[89]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[90]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[91]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[92]  Jude W. Shavlik,et al.  Growing Simpler Decision Trees to Facilitate Knowledge Discovery , 1996, KDD.

[93]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[94]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[95]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[96]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[97]  Christophe Giraud-Carrier,et al.  FLARE: Induction with Prior Knowledge , 1996 .

[98]  Joseph P. Bigus,et al.  Data mining with neural networks: solving business problems from application development to decision support , 1996 .

[99]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[100]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[101]  Vasant Dhar,et al.  Seven Methods for Transforming Corporate Data Into Business Intelligence , 1996 .

[102]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[103]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[104]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[105]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[106]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[107]  Ron Kohavi,et al.  Option Decision Trees with Majority Votes , 1997, ICML.

[108]  George H. John Enhancements to the data mining process , 1997 .

[109]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[110]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[111]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[112]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[113]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[114]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[115]  Pedro M. Domingos Knowledge Acquisition form Examples Vis Multiple Models , 1997, ICML.

[116]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[117]  Rolf Stadler,et al.  Discovering Data Mining: From Concept to Implementation , 1997 .

[118]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[119]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[120]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[121]  Tim Oates,et al.  The Effects of Training Set Size on Decision Tree Complexity , 1997, ICML.

[122]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[123]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[124]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[125]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[126]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[127]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[128]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[129]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[130]  Debbie Richards,et al.  Taking up the situated cognition challenge with ripple down rules , 1998, Int. J. Hum. Comput. Stud..

[131]  David W. Aha,et al.  Error-Correcting Output Codes for Local Learners , 1998, ECML.

[132]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[133]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[134]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[135]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[136]  Bernhard Schölkopf,et al.  Shrinking the Tube: A New Support Vector Regression Algorithm , 1998, NIPS.

[137]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[138]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[139]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[140]  Geoff Holmes,et al.  Generating Rule Sets from Model Trees , 1999, Australian Joint Conference on Artificial Intelligence.

[141]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[142]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[143]  Douglas E. Appelt,et al.  Introduction to Information Extraction , 1999, AI Commun..

[144]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[145]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[146]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[147]  Ian H. Witten,et al.  Making Better Use of Global Discretization , 1999, ICML.

[148]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[149]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[150]  Ian H. Witten,et al.  Text mining: a new frontier for lossless compression , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[151]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[152]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[153]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[154]  Un Yong Nahm and Raymond J. Mooney,et al.  Using Information Extraction to Aid the Discovery of Prediction Rules from Text , 2000 .

[155]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[156]  Gordon W. Paynter,et al.  Automating iterative tasks with programming by demonstration , 2000 .

[157]  Eibe Frank,et al.  Pruning Decision Trees and Lists , 2000 .

[158]  Jan Ramon,et al.  Multi instance neural networks , 2000, ICML 2000.

[159]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[160]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[161]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[162]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[163]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[164]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[165]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[166]  Andrew W. Moore,et al.  A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets , 2000, ICML.

[167]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[168]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[169]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[170]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[171]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.