Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data

In recent years, bootstrap methods have drawn attention for their ability to approximate the laws of "max statistics" in high-dimensional problems. A leading example of such a statistic is the coordinate-wise maximum of a sample average of $n$ random vectors in $\mathbb{R}^p$. Existing results for this statistic show that the bootstrap can work when $n\ll p$, and rates of approximation (in Kolmogorov distance) have been obtained with only logarithmic dependence in $p$. Nevertheless, one of the challenging aspects of this setting is that established rates tend to scale like $n^{-1/6}$ as a function of $n$. The main purpose of this paper is to demonstrate that improvement in rate is possible when extra model structure is available. Specifically, we show that if the coordinate-wise variances of the observations exhibit decay, then a nearly $n^{-1/2}$ rate can be achieved, independent of $p$. Furthermore, a surprising aspect of this dimension-free rate is that it holds even when the decay is very weak. Lastly, we provide examples showing how these ideas can be applied to inference problems dealing with functional and multinomial data.

[1]  D. C. Hurst,et al.  Large Sample Simultaneous Confidence Intervals for Multinomial Proportions , 1964 .

[2]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[3]  Lars Holst,et al.  Asymptotic normality and efficiency for certain goodness-of-fit tests , 1972 .

[4]  P. Holland,et al.  Simultaneous Estimation of Multinomial Cell Probabilities , 1973 .

[5]  S. Nagaev An estimate of the remainder term in the multidimensional central limit theorem , 1976 .

[6]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[7]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  V. Bentkus Lower bounds for the rate of convergence in the central limit theorem in Banach spaces , 1985 .

[10]  W. Johnson Best Constants in Moment Inequalities for Linear Combinations of Independent and Exchangeable Random Variables , 1985 .

[11]  D. Mason,et al.  Weighted Empirical and Quantile Processes , 1986 .

[12]  V. Rotaŕ,et al.  On the Convergence Rate in the Infinite-Dimensional Central Limit Theorem for Probabilities of Hitting Parallelepipeds , 1986 .

[13]  P. Massart Rates of convergence in the central limit theorem for empirical processes , 1986 .

[14]  Alastair Scott,et al.  Quick Simultaneous Confidence Intervals for Multinomial Proportions , 1987 .

[15]  D. Zelterman Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[16]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[17]  Pascal Massart,et al.  STRONG APPROXIMATION FOR MULTIVARIATE EMPIRICAL AND RELATED PROCESSES, VIA KMT CONSTRUCTIONS , 1989 .

[18]  F. Götze On the Rate of Convergence in the Multivariate CLT , 1991 .

[19]  On smoothness conditions and convergence rates in the CLT in Banach spaces , 1993 .

[20]  Joseph Glaz,et al.  Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions , 1995 .

[21]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22]  M. Bloznelis On the rate of normal approximation inD[0, 1] , 1997 .

[23]  J. Glaz,et al.  Simultaneous confidence intervals for multinomial proportions , 1999 .

[24]  F. Götze,et al.  The Accuracy of Gaussian Approximation in Banach Spaces , 2000 .

[25]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[26]  Qi-Man Shao,et al.  A normal comparison inequality and its applications , 2002 .

[27]  J. Wellner,et al.  High Dimensional Probability III , 2003 .

[28]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[29]  F. Nazarov On the Maximal Perimeter of a Convex Set in $ ℝ n $$\mathbb{R}^n$ with Respect to a Gaussian Measure , 2003 .

[30]  V. Bentkus A Lyapunov-type Bound in Rd , 2005 .

[31]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[32]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[33]  T. Tony Cai,et al.  Prediction in functional linear regression , 2006 .

[34]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[35]  Ryan O'Donnell,et al.  Learning Geometric Concepts via Gaussian Surface Area , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[36]  Hsiuying Wang,et al.  Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions , 2008 .

[37]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[38]  Djalil CHAFAÏ,et al.  Confidence Regions for the Multinomial Parameter With Small Sample Size , 2008, 0805.1971.

[39]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[40]  Alois Kneip,et al.  Common Functional Principal Components , 2006, 0901.4252.

[41]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[42]  D. Degras,et al.  Simultaneous confidence bands for nonparametric regression with functional data , 2009, 0908.1980.

[43]  G. Blanchard,et al.  Some nonasymptotic results on resampling in high dimension, II: Multiple tests , 2010 .

[44]  G. Blanchard,et al.  Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests , 2007, 0712.0775.

[45]  Ron Reeder,et al.  Estimation of the mean of functional time series and a two‐sample problem , 2011, 1105.0019.

[46]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[47]  Xiao Fang,et al.  Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach , 2011, 1111.4073.

[48]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[49]  F. Bunea,et al.  On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA , 2012, 1212.5321.

[50]  Lijian Yang,et al.  Simultaneous inference for the mean function based on dense functional data , 2012, Journal of nonparametric statistics.

[51]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[52]  J. Bénasséni A new derivation of eigenvalue inequalities for the multinomial distribution , 2012 .

[53]  Dong Chen,et al.  Nonlinear manifold representations for functional data , 2012, 1205.6040.

[54]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.

[55]  Piotr Kokoszka,et al.  Inference for Functional Data with Applications , 2012 .

[56]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[57]  Kengo Kato,et al.  Anti-concentration and honest, adaptive confidence bands , 2013, 1303.7152.

[58]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[59]  Alessandro Rinaldo,et al.  Berry-Esseen bounds for estimating undirected graphs , 2014 .

[60]  W. Härdle,et al.  A Smooth Simultaneous Confidence Corridor for the Mean of Sparse Functional Data , 2014 .

[61]  V. Koltchinskii,et al.  Concentration inequalities and moment bounds for sample covariance operators , 2014, 1405.2468.

[62]  Kengo Kato,et al.  Central limit theorems and bootstrap in high dimensions , 2014, 1412.3661.

[63]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2014 .

[64]  T. Hsing,et al.  Theoretical foundations of functional data analysis, with an introduction to linear operators , 2015 .

[65]  Van H. Vu,et al.  Random weighted projections, random quadratic forms and random eigenvectors , 2013, Random Struct. Algorithms.

[66]  Vladimir Koltchinskii,et al.  Normal approximation and concentration of spectral projectors of sample covariance , 2015, 1504.07333.

[67]  Jane-Ling Wang,et al.  Review of Functional Data Analysis , 2015, 1507.05135.

[68]  Christopher R. Genovese,et al.  Asymptotic theory for density ridges , 2014, 1406.5663.

[69]  Peter Bühlmann,et al.  High-dimensional simultaneous inference with the bootstrap , 2016, 1606.03940.

[70]  M. Reimherr,et al.  A geometric approach to confidence regions and bands for functional parameters , 2016, 1607.07771.

[71]  Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings , 2016 .

[72]  M. Reiß,et al.  Nonasymptotic upper bounds for the reconstruction error of PCA , 2016, The Annals of Statistics.

[73]  Guang Cheng,et al.  Simultaneous Inference for High-Dimensional Linear Models , 2016, 1603.01295.

[74]  Qiwei Yao,et al.  Testing for high-dimensional white noise using maximum cross-correlations , 2016, 1608.02067.

[75]  Vladimir Koltchinskii,et al.  Efficient estimation of linear functionals of principal components , 2017, The Annals of Statistics.

[76]  Junyong Park,et al.  Two-sample test for sparse high-dimensional multinomial distributions , 2017, TEST.

[77]  Jianqing Fan,et al.  ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS. , 2015, Annals of statistics.

[78]  Christian Hansen,et al.  High-dimensional econometrics and regularized GMM , 2018, 1806.01888.

[79]  Xiaohui Chen Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications , 2016, 1610.00032.

[80]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[81]  Jeongyoun Ahn,et al.  On the number of principal components in high dimensions , 2017, 1708.04981.

[82]  Vladimir Spokoiny,et al.  Bootstrap confidence sets for spectral projectors of sample covariance , 2017, Probability Theory and Related Fields.

[83]  Sivaraman Balakrishnan,et al.  Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates , 2017, The Annals of Statistics.

[84]  Hau-Tieng Wu,et al.  A new test for functional one-way ANOVA with applications to ischemic heart screening , 2019, Comput. Stat. Data Anal..

[85]  Cun-Hui Zhang,et al.  Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors , 2017, The Annals of Statistics.