Machine learning: a review of classification and combining techniques

Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers.

[1]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[2]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[3]  Robert G. Cowell,et al.  Conditions Under Which Conditional Independence and Scoring Methods Lead to Identical Selection of Bayesian Network Models , 2001, UAI.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[6]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[9]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[10]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[11]  Leszek Plaskota,et al.  Information complexity of neural networks , 2000, Neural Networks.

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Geoffrey I. Webb,et al.  On Why Discretization Works for Naive-Bayes Classifiers , 2003, Australian Conference on Artificial Intelligence.

[14]  Tapio Elomaa The Biases of Decision Tree Pruning Strategies , 1999, IDA.

[15]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[16]  Tommy W. S. Chow,et al.  Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients , 2001, IEEE Trans. Neural Networks.

[17]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[18]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[19]  Lakhmi C. Jain,et al.  Radial Basis Function Networks 2: New Advances in Design , 2001 .

[20]  Nick Cercone,et al.  Rule Quality Measures Improve the Accuracy of Rule Induction: An Experimental Approach , 2000, ISMIS.

[21]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[22]  David Saad,et al.  On-Line Learning in Neural Networks , 1999 .

[23]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[24]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[25]  David M. Dutton,et al.  A review of machine learning , 1997, The Knowledge Engineering Review.

[26]  Remco R. Bouckaert Naive Bayes Classifiers That Perform Well with Continuous Variables , 2004, Australian Conference on Artificial Intelligence.

[27]  Peter A. Flach,et al.  The role of feature construction in inductive rule learning , 2000 .

[28]  De Raedt,et al.  Advances in Inductive Logic Programming , 1996 .

[29]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[30]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[31]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[32]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[33]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.

[34]  João Gama,et al.  Linear tree , 1999, Intell. Data Anal..

[35]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[36]  Filippo Neri,et al.  Learning in the “Real World” , 1998, Machine Learning.

[37]  Johannes Fürnkranz,et al.  Pruning Algorithms for Rule Learning , 1997, Machine Learning.

[38]  Asim Roy,et al.  On connectionism, rule extraction, and brain-like learning , 2000, IEEE Trans. Fuzzy Syst..

[39]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[40]  João Gama,et al.  On Data and Algorithms: Understanding Inductive Performance , 2004, Machine Learning.

[41]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[42]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[43]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[44]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[45]  Basilio Sierra,et al.  BAYES-NEAREST: A New Hybrid Classifier Combining Bayesian Network and Distance Based Algorithms , 2003, EPIA.

[46]  Colin R. Reeves,et al.  Genetic Algorithms: Principles and Perspectives: A Guide to Ga Theory , 2002 .

[47]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[48]  Zhi-Hua Zhou,et al.  Rule extraction: Using neural networks or for neural networks? , 2004, Journal of Computer Science and Technology.

[49]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[50]  Haiming Lu,et al.  Hierarchical genetic algorithm based neural network design , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[51]  Ravi Kothari,et al.  A Classification Paradigm for Distributed Vertically Partitioned Data , 2004, Neural Computation.

[52]  Matthias Klusch,et al.  Agent-Based Distributed Data Mining: The KDEC Scheme , 2003, AgentLink.

[53]  Padraig Cunningham,et al.  Explaining the output of ensembles in medical decision support on a case by case basis , 2003, Artif. Intell. Medicine.

[54]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[55]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[56]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[57]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[58]  David McSherry,et al.  Strategic induction of decision trees , 1999, Knowl. Based Syst..

[59]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[60]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[61]  Tony Lindgren Methods for Rule Conflict Resolution , 2004, ECML.

[62]  Dimitrios Gunopulos,et al.  Adaptive Nearest Neighbor Classification Using Support Vector Machines , 2001, NIPS.

[63]  Ramón López de Mántaras,et al.  Machine Learning from Examples: Inductive and Lazy Methods , 1998, Data Knowl. Eng..

[64]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[65]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[66]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[67]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[68]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[69]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[70]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[71]  Stephen Muggleton,et al.  Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic , 1999, Artif. Intell..

[72]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[73]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[74]  Jyrki Kivinen,et al.  Online Learning of Linear Classifiers , 2002, Machine Learning Summer School.

[75]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[76]  Christopher K. I. Williams,et al.  Comparing Bayesian neural network algorithms for classifying segmented outdoor images , 2001, Neural Networks.

[77]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[78]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[79]  Christos Schizas,et al.  Artificial Neural Network Learning: A Comparative Review , 2002, SETN.

[80]  Georg Gottlob,et al.  Complexity and expressive power of logic programming , 2001, CSUR.

[81]  Jihoon Yang,et al.  Constructive Neural-Network Learning Algorithms for Pattern Classification , 2000 .

[82]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[83]  Alex A. Freitas,et al.  Discovering comprehensible classification rules with a genetic algorithm , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[84]  RamakrishnanRaghu,et al.  RainForestA Framework for Fast Decision Tree Construction of Large Datasets , 2000 .

[85]  Wee Kheng Leow,et al.  FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks , 2004, Applied Intelligence.

[86]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[87]  Takashi Yoneyama,et al.  Specification of Training Sets and the Number of Hidden Neurons for Multilayer Perceptrons , 2001, Neural Computation.

[88]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[89]  Andrea Bonarini,et al.  An Introduction to Learning Fuzzy Classifier Systems , 1999, Learning Classifier Systems.

[90]  Johannes Fürnkranz,et al.  Round Robin Rule Learning , 2001, ICML.

[91]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[92]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[93]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[94]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[95]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[96]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[97]  Shaul Markovitch,et al.  Feature Generation Using General Constructor Functions , 2002, Machine Learning.

[98]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[99]  Nick Cercone,et al.  Discretization of Continuous Attributes for Learning Classification Rules , 1999, PAKDD.

[100]  Zijian Zheng,et al.  Constructing X-of-N Attributes for Decision Tree Learning , 2000, Machine Learning.

[101]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[102]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[103]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[104]  Dimitrios Gunopulos,et al.  Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..

[105]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[106]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[107]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[108]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[109]  Zhi-Hua Zhou,et al.  Hybrid decision tree , 2002, Knowl. Based Syst..

[110]  Nobuhiro Yugami,et al.  Effects of domain characteristics on instance-based learning algorithms , 2003, Theor. Comput. Sci..

[111]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[112]  Zijian Zheng,et al.  Constructing conjunctions using systematic search on decision trees , 1998, Knowl. Based Syst..

[113]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[114]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[115]  Michael G. Madden,et al.  The Performance of Bayesian Network Classifiers Constructed using Different Techniques , 2003 .

[116]  Vipin Kumar,et al.  Parallel formulations of decision-tree classification algorithms , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[117]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[118]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[119]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[120]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[121]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[122]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[123]  José Salvador Sánchez,et al.  On Filtering the Training Prototypes in Nearest Neighbour Classification , 2002, CCIA.

[124]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[125]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[126]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[127]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[128]  Sung Wook Baik,et al.  A Decision Tree Algorithm for Distributed Data Mining: Towards Network Intrusion Detection , 2004, ICCSA.

[129]  Ludmila I. Kuncheva,et al.  Feature Subsets for Classifier Combination: An Enumerative Experiment , 2001, Multiple Classifier Systems.

[130]  M. O. Tokhi,et al.  Training neural networks: backpropagation vs. genetic algorithms , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[131]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[132]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[133]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[134]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[135]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[136]  Ling Li,et al.  Improving the Performance of Decision Tree: A Hybrid Approach , 2004, ER.

[137]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[138]  Lawrence E. Widman Knowledge-based systems in cardiovascular medicine , 1997 .