Chapter 4 – Classification

In classification or class prediction, we try to use the information from the predictors or independent variables to sort the data samples into two or more distinct classes or buckets. Classification is the most widely used data mining task in business. There are several ways to build classification models. In this chapter, we will discuss and show the implementation of six of the most commonly used classification algorithms: decision trees, rule induction, k-nearest neighbors, naive Bayesian, artificial neural networks, and support vector machines. We conclude this chapter with building ensemble classification models and a discussion on bagging, boosting, and random forests.

[1]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[2]  Eldon Y. Li Artificial neural networks and their business applications , 1994, Inf. Manag..

[3]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[4]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[5]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[6]  Oliver Kramer,et al.  K-Nearest Neighbors , 2013 .

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Nasser Yazdani,et al.  Application of ensemble models in web ranking , 2010, 2010 5th International Symposium on Telecommunications.

[11]  Thomas Hill Statistics: Methods and Applications , 2005 .

[12]  Michael D. Ward,et al.  Improving Predictions using Ensemble Bayesian Model Averaging , 2012, Political Analysis.

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[17]  P. Patel Predicting the future of drought prediction , 2012 .

[18]  Jonathan A. Zdziarski,et al.  Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification , 2005 .

[19]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[20]  R. Fletcher Practical Methods of Optimization , 1988 .

[21]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[22]  Lawrence D. Jackel,et al.  Handwritten character recognition using neural network architectures , 1990 .

[23]  Peter Grabusts,et al.  The Choice of Metrics for Clustering Algorithms , 2015 .

[24]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  Ku Ruhana Ku-Mahamud,et al.  Hybrid Ant Colony Optimization and Simulated Annealing for Rule Induction , 2011, 2011 UKSim 5th European Symposium on Computer Modeling and Simulation.

[27]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.