Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present Maximum Variance Total Variation denoising (MVTV), an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a state-of-the-art alternative method for interpretable nonlinear regression. MVTV divides the feature space into blocks of constant value and fits the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive, in that it incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against CART and CRISP via both a complexity-accuracy tradeoff metric and a human study, demonstrating that that MVTV is a more powerful and interpretable method.

[1]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[2]  Noah Simon,et al.  Convex Regression with Interpretable Sharp Partitions , 2016, J. Mach. Learn. Res..

[3]  James G. Scott,et al.  Multiscale Spatial Density Smoothing: An Application to Large-Scale Radiological Survey and Anomaly Detection , 2015, 1507.07271.

[4]  J. Marc Overhage,et al.  A Context-sensitive Approach to Anonymizing Spatial Surveillance Data , 2006 .

[5]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[6]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[7]  C. Walck Hand-book on statistical distributions for experimentalists , 1996 .

[8]  Antonin Chambolle,et al.  On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows , 2009, International Journal of Computer Vision.

[9]  Nicholas A. Johnson,et al.  A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation , 2013 .

[10]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  Maya R. Gupta,et al.  Monotonic Calibrated Interpolated Look-Up Tables , 2015, J. Mach. Learn. Res..

[13]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..

[14]  Suvrit Sra,et al.  Modular Proximal Optimization for Multidimensional Total-Variation Regularization , 2014, J. Mach. Learn. Res..