论文信息 - Learning Sparse Additive Models with Interactions in High Dimensions

Learning Sparse Additive Models with Interactions in High Dimensions

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is referred to as a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$, where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $\phi_l$'s and $\mathcal{S}$ to be unknown, the problem of estimating $f$ from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $$f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}\phi_{(l,l^{\prime})} (x_{l},x_{l^{\prime}}).$$ Assuming $\phi_{p},\phi_{(l,l^{\prime})}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$. Consequently, this also enables us to estimate the underlying $\phi_p, \phi_{(l,l^{\prime})}$. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise -- either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.

[1] Holger Rauhut,et al. Compressive Sensing with structured random matrices , 2012 .

[2] V. Smirnov,et al. A course of higher mathematics , 1964 .

[3] J. Komlos,et al. On the Size of Separating Systems and Families of Perfect Hash Functions , 1984 .

[4] Yun Yang,et al. Minimax-optimal nonparametric regression in high dimensions , 2014, 1401.7278.

[5] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[6] Ryan O'Donnell,et al. Learning juntas , 2003, STOC '03.

[7] Taiji Suzuki,et al. PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model , 2012, COLT.

[8] R. Tibshirani,et al. A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[9] J. Horowitz,et al. VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[10] Grace Wahba. An introduction to smoothing spline ANOVA models in RKHS, with examples in geographical data, medicine, atmospheric sciences and machine learning , 2003 .

[11] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[12] R. DeVore,et al. A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[13] Volkan Cevher,et al. Recipes on hard thresholding methods , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[14] Martin J. Wainwright,et al. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[15] I. Daubechies,et al. Capturing Ridge Functions in High Dimensions from Point Queries , 2012 .

[16] K. Ritter,et al. Minimal Errors for Strong and Weak Approximation of Stochastic Differential Equations , 2008 .

[17] Ming Yuan,et al. Nonnegative Garrote Component Selection in Functional ANOVA models , 2007, AISTATS.

[18] C. J. Stone,et al. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[19] Georgios B. Giannakis,et al. Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation , 2011, IEEE Transactions on Signal Processing.

[20] Ji Zhu,et al. Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[21] Hao Helen Zhang,et al. Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[22] Jan Vybíral,et al. Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[23] R. DeVore,et al. Approximation of Functions of Few Variables in High Dimensions , 2011 .

[24] Jan Vybíral,et al. On some aspects of approximation of ridge functions , 2014, J. Approx. Theory.

[25] Martin J. Wainwright,et al. Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[26] Ming Yuan,et al. Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.

[27] Jan Vybíral,et al. Compressed learning of high-dimensional sparse functions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28] H. Woxniakowski. Information-Based Complexity , 1988 .

[29] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[30] C. R. Deboor,et al. A practical guide to splines , 1978 .

[31] Robert D. Nowak,et al. Sparse interactions: Identifying high-dimensional multilinear systems via compressed sensing , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32] J. Lafferty,et al. Sparse additive models , 2007, 0711.4555.

[33] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[34] M. Powell,et al. On the Estimation of Sparse Hessian Matrices , 1979 .

[35] Volkan Cevher,et al. Hard thresholding with norm constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[37] Arkadi Nemirovski,et al. Topics in Non-Parametric Statistics , 2000 .

[38] Gareth M. James,et al. Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[39] Brian J Reich,et al. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property. , 2011, Statistica Sinica.

[40] Thomas F. Coleman,et al. Estimation of sparse hessian matrices and graph coloring problems , 1982, Math. Program..

[41] A. Nilli. Perfect Hashing and Probability , 1994, Combinatorics, Probability and Computing.

[42] A. Dalalyan,et al. Tight conditions for consistency of variable selection in the context of high dimensionality , 2011, 1106.4293.

[43] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[44] Martin Wahl. Variable selection in high-dimensional additive models based on norms of projections , 2014, 1406.0052.

[45] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[46] Volkan Cevher,et al. Combinatorial selection and least absolute shrinkage via the Clash algorithm , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[47] S. Geer,et al. High-dimensional additive modeling , 2008, 0806.4115.

[48] H. Müller,et al. Local Polynomial Modeling and Its Applications , 1998 .

[49] R. Caflisch. Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[50] Holger Rauhut,et al. A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[51] Andrea J. Goldsmith,et al. Exact and Stable Covariance Estimation From Quadratic Sampling via Convex Programming , 2013, IEEE Transactions on Information Theory.

[52] Martin J. Wainwright,et al. Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[53] János Körner,et al. New Bounds for Perfect Hashing via Information Theory , 1988, Eur. J. Comb..

[54] Chong Gu. Smoothing Spline Anova Models , 2002 .

[55] Andreas Krause,et al. Learning Sparse Additive Models with Interactions in High Dimensions , 2016, AISTATS.

[56] V. Koltchinskii,et al. SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[57] Andreas Krause,et al. Efficient Sampling for Learning Sparse Additive Models in High Dimensions , 2014, NIPS.

[58] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[59] H. Rauhut. Compressive Sensing and Structured Random Matrices , 2009 .

[60] Volkan Cevher,et al. Active Learning of Multi-Index Function Models , 2012, NIPS.

[61] Katya Scheinberg,et al. Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization , 2012, Mathematical Programming.

[62] Yu. I. Ingster,et al. Statistical inference in compound functional models , 2012, 1208.6402.

[63] Arnak S. Dalalyan,et al. Tight conditions for consistent variable selection in high dimensional nonparametric regression , 2011, COLT.

[64] P. Wojtaszczyk. 1 Minimization with Noisy Data , 2012, SIAM J. Numer. Anal..

[65] Aravind Srinivasan,et al. Splitters and near-optimal derandomization , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[66] M. Maathuis,et al. Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[67] Mike E. Davies,et al. Normalized Iterative Hard Thresholding: Guaranteed Stability and Performance , 2010, IEEE Journal of Selected Topics in Signal Processing.