Convex Regression with Interpretable Sharp Partitions

We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.

[1]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[2]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[3]  J. Friedman Multivariate adaptive regression splines , 1990 .

[4]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[5]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[6]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[10]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[11]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[12]  Ashley Petersen,et al.  Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[13]  Ryan J. Tibshirani,et al.  Fast and Flexible ADMM Algorithms for Trend Filtering , 2014, ArXiv.

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[18]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Douglas W. Nychka,et al.  Tools for Spatial Data , 2016 .

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[23]  Stephen P. Boyd,et al.  1 Trend Filtering , 2009, SIAM Rev..

[24]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.