Optimization of Hierarchical Regression Model with Application to Optimizing Multi-Response Regression K-ary Trees

A fast, convenient and well-known way toward regression is to induce and prune a binary tree. However, there has been little attempt toward improving the performance of an induced regression tree. This paper presents a meta-algorithm capable of minimizing the regression loss function, thus, improving the accuracy of any given hierarchical model, such as k-ary regression trees. Our proposed method minimizes the loss function of each node one by one. At split nodes, this leads to solving an instance-based cost-sensitive classification problem over the node’s data points. At the leaf nodes, the method leads to a simple regression problem. In the case of binary univariate and multivariate regression trees, the computational complexity of training is linear over the samples. Hence, our method is scalable to large trees and datasets. We also briefly explore possibilities of applying proposed method to classification tasks. We show that our algorithm has significantly better test error compared to other state-ofthe- art tree algorithms. At the end, accuracy, memory usage and query time of our method are compared to recently introduced forest models. We depict that, most of the time, our proposed method is able to achieve better or similar accuracy while having tangibly faster query time and smaller number of nonzero weights.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Andrew Martin,et al.  Forest-type Regression with General Losses and Robust Forest , 2017, ICML.

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[7]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[8]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[9]  Chih-Jen Lin,et al.  A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification , 2017, ACML.

[10]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[11]  Miguel Á. Carreira-Perpiñán,et al.  Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[12]  Christian Igel,et al.  Computational Complexity of Linear Large Margin Classification With Ramp Loss , 2015, AISTATS.

[13]  Saso Dzeroski,et al.  Constraint Based Induction of Multi-objective Regression Trees , 2005, KDID.

[14]  Kyuseok Shim,et al.  Building Decision Trees with Constraints , 2001 .

[15]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[16]  Michelangelo Ceci,et al.  Semi-supervised Learning for Multi-target Regression , 2014, NFMCP.

[17]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[18]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[19]  Saso Dzeroski,et al.  Tree ensembles for predicting structured outputs , 2013, Pattern Recognit..

[20]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[24]  Siegfried Nijssen,et al.  Mining optimal decision trees from itemset lattices , 2007, KDD '07.

[25]  David J. Fleet,et al.  Efficient Non-greedy Optimization of Decision Trees , 2015, NIPS.

[26]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[27]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[28]  David J. Fleet,et al.  CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits , 2015, ArXiv.

[29]  Pierre Geurts,et al.  Globally Induced Forest: A Prepruning Compression Scheme , 2017, ICML.

[30]  Kristin P. Bennett,et al.  Global Tree Optimization: A Non-greedy Decision Tree Algorithm , 2007 .

[31]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[32]  S. Džeroski,et al.  Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition , 2009 .

[33]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[34]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[35]  Kristin P. Bennett,et al.  Decision Tree Construction Via Linear Programming , 1992 .

[36]  Saso Dzeroski,et al.  Incremental multi-target model trees for data streams , 2011, SAC.

[37]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[38]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .