Learning sparse optimal rule fit by safe screening

In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets.

[1]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[2]  Wojciech Kotlowski,et al.  Solving Regression by Learning an Ensemble of Decision Rules , 2006, ICAISC.

[3]  Ichiro Takeuchi,et al.  Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling , 2016, ICML.

[4]  Ichiro Takeuchi,et al.  Safe Screening of Non-Support Vectors in Pathwise SVM Computation , 2013, ICML.

[5]  Alexandre Gramfort,et al.  GAP Safe screening rules for sparse multi-task and multi-class models , 2015, NIPS.

[6]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[7]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[9]  Jieping Ye,et al.  Safe Screening With Variational Inequalities and Its Applicaiton to LASSO , 2013, ICML.

[10]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[11]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Wojciech Kotlowski,et al.  Maximum likelihood rule ensembles , 2008, ICML '08.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Noam Goldberg,et al.  Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement , 2017, ICML.

[16]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[17]  Julian Zimmert Safe screening for support vector machines , 2015 .

[18]  Jieping Ye,et al.  Scaling SVM and Least Absolute Deviations via Exact Data Reduction , 2013, ICML.

[19]  Ichiro Takeuchi,et al.  Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining , 2016, KDD.

[20]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.