Maximum-Variance Total Variation Denoising for Interpretable Spatial Smoothing

We consider the problem of spatial regression where interpretability of the model is a high priority. Such problems appear frequently in a diverse set of fields from climatology to epidemiology to predictive policing. For cognitive, logistical, and organizational reasons, humans tend to infer regions or neighborhoods of constant value, often with sharp discontinuities between regions, and then assign resources on a perregion basis. Automating this smoothing process presents a unique challenge for spatial smoothing algorithms, which tend to assume stationarity and smoothness everywhere. To address this problem, we propose Maximum Variance Total Variation (MVTV) denoising, a novel method for interpretable nonlinear spatial regression. MVTV divides the feature space into blocks of constant value and smooths the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive and incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against the existing CART and CRISP methods via both a complexity-accuracy tradeoff metric and a human study, demonstrating that that MVTV is a more powerful and interpretable method.

[1]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[2]  J. Marc Overhage,et al.  Application of Information Technology: A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection , 2006, J. Am. Medical Informatics Assoc..

[3]  Nicholas A. Johnson,et al.  A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation , 2013 .

[4]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[5]  James G. Scott,et al.  Multiscale Spatial Density Smoothing: An Application to Large-Scale Radiological Survey and Anomaly Detection , 2015, 1507.07271.

[6]  Maya R. Gupta,et al.  Monotonic Calibrated Interpolated Look-Up Tables , 2015, J. Mach. Learn. Res..

[7]  Yu-Xiang Wang,et al.  Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers , 2016, NIPS.

[8]  Noah Simon,et al.  Convex Regression with Interpretable Sharp Partitions , 2016, J. Mach. Learn. Res..

[9]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[10]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[11]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[12]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[13]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Antonin Chambolle,et al.  On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows , 2009, International Journal of Computer Vision.

[16]  Oluwasanmi Koyejo,et al.  False Discovery Rate Smoothing , 2014, Journal of the American Statistical Association.

[17]  Suvrit Sra,et al.  Modular Proximal Optimization for Multidimensional Total-Variation Regularization , 2014, J. Mach. Learn. Res..