Solving Regression by Learning an Ensemble of Decision Rules

We introduce a novel decision rule induction algorithm for solving the regression problem. There are only few approaches in which decision rules are applied to this type of prediction problems. The algorithm uses a single decision rule as a base classifier in the ensemble. Forward stagewise additive modeling is used in order to obtain the ensemble of decision rules. We consider two types of loss functions, the squared- and absolute-error loss, that are commonly used in regression problems. The minimization of empirical risk based on these loss functions is performed by two optimization techniques, the gradient boosting and the least angle technique. The main advantage of decision rules is their simplicity and good interpretability. The prediction model in the form of an ensemble of decision rules is powerful, which is shown by results of the experiment presented in the paper.

[1]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[2]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  Daniel Vanderpooten,et al.  Induction of decision rules in classification and discovery-oriented perspectives , 2001, Int. J. Intell. Syst..

[5]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[6]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[7]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  L. Breiman Arcing Classifiers , 1998 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[14]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[15]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[16]  Jerzy Stefanowski,et al.  On rough set based approaches to induction of decision rules , 1998 .

[17]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[20]  Bogdan E. Popescu,et al.  Gradient Directed Regularization , 2004 .

[21]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[22]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[23]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[26]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[27]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[28]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[29]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[30]  Roman Słowiński,et al.  Ensembles of decision rules , 2006 .

[31]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[32]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[35]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[36]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[37]  Salvatore Greco,et al.  An Algorithm for Induction of Decision Rules Consistent with the Dominance Principle , 2000, Rough Sets and Current Trends in Computing.

[38]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[39]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[40]  Tomasz Terlikowski Descriptional independence and the formal definition of sequential control structure , 2003, Fundam. Informaticae.

[41]  Arkadiusz Wojna,et al.  RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning , 2002, Fundam. Informaticae.

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[43]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[44]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[45]  Andrzej Skowron,et al.  EXTRACTING LAWS FROM DECISION TABLES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[46]  Wojciech Kotlowski,et al.  Ensemble of Decision Rules for Ordinal Classification with Monotonicity Constraints , 2008, RSKT.

[47]  Sholom M. Weiss,et al.  Lightweight Rule Induction , 2000, ICML.

[48]  Sholom M. Weiss,et al.  Solving regression problems with rule-based ensemble classifiers , 2001, KDD '01.