Inference in adaptive regression via the Kac–Rice formula

Abstract : We derive an exact p-value for testing a global null hypothesis in a general adaptive regression setting. Our approach uses the Kac-Rice formula (as described in Adler & Taylor 2007) applied to the problem of maximizing a Gaussian process. The resulting test statistic has a known distribution in finite samples, assuming Gaussian errors. We examine this test statistic in the case of the lasso, group lasso, principal components and matrix completion problems. For the lasso problem, our test relates closely to the recently proposed covariance test of Lockhart et al. (2013). Our approach also yields exact selective inference for the mean parameter at the global maximizer of the process.

[1]  David R. Brillinger,et al.  On the Number of Solutions of Systems of Random Equations , 1972 .

[2]  A. Takemura Weights of 2 distribution for smooth or piecewise smooth cone alternatives , 1995 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Akimichi Takemura,et al.  Weights of $\bar{x}^2$ distribution for smooth or piecewise smooth cone alternatives , 1997 .

[5]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[6]  Akimichi Takemura,et al.  ON THE EQUIVALENCE OF THE TUBE AND EULER CHARACTERISTIC METHODS FOR THE DISTRIBUTION OF THE MAXIMUM OF GAUSSIAN FIELDS OVER PIECEWISE SMOOTH DOMAINS , 2002 .

[7]  Akimichi Takemura,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Validity of the expected Euler characteristic heuristic , 2003 .

[8]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[9]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[10]  Jean-Marc Azais Mario Wschebor A general expression for the distribution of the maximum of a Gaussian field and the approximation of the tail , 2006, math/0607041.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[14]  J. Azaïs,et al.  Erratum to: “A general expression for the distribution of the maximum of a Gaussian field and the approximation of the tail” [Stochastic Process. Appl. 118 (7) (2008) 1190–1218] , 2010 .

[15]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[16]  E. Candès,et al.  Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism , 2010, 1007.1434.

[17]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[20]  I. Johnstone,et al.  Augmented sparse principal component analysis for high dimensional data , 2012, 1202.1242.

[21]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[22]  Trevor Hastie,et al.  Learning interactions through hierarchical group-lasso regularization , 2013, 1308.2719.

[23]  Dennis L. Sun,et al.  Exact post-selection inference with the lasso , 2013 .

[24]  Emmanuel J. Candès,et al.  Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators , 2012, IEEE Transactions on Signal Processing.

[25]  Jonathan E. Taylor The geometry of least squares in the 21st century , 2013, 1309.7837.

[26]  R. Tibshirani,et al.  Exact Post-Selection Inference for Sequential Regression Procedures , 2014, 1401.3889.

[27]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[28]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[29]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[30]  Joshua R. Loftus,et al.  A significance test for forward stepwise model selection , 2014, 1405.3920.

[31]  R. Tibshirani,et al.  Exact Post-selection Inference for Forward Stepwise and Least Angle Regression , 2014 .

[32]  J. Zhu,et al.  On the degrees of freedom of reduced-rank estimators in multivariate regression. , 2012, Biometrika.