Asymptotic inference for high-dimensional data

In this paper, we study inference for high-dimensional data characterized by small sample sizes relative to the dimension of the data. In particular, we provide an infinite-dimensional framework to study statistical models that involve situations in which (i) the number of parameters increase with the sample size (that is, allowed to be random) and (ii) there is a possibility of missing data. Under a variety of tail conditions on the components of the data, we provide precise conditions for the joint consistency of the estimators of the mean. In the process, we clarify and improve some of the recent consistency results that appeared in the literature. An important aspect of the work presented is the development of asymptotic normality results for these models. As a consequence, we construct different test statistics for one-sample and two-sample problems concerning the mean vector and obtain their asymptotic distributions as a corollary of the infinite-dimensional results. Finally, we use these theoretical results to develop an asymptotically justifiable methodology for data analyses. Simulation results presented here describe situations where the methodology can be successfully applied. They also evaluate its robustness under a variety of conditions, some of which are substantially different from the technical conditions. Comparisons to other methods used in the literature are provided. Analyses of real-life data is also included.

[1]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[2]  M. Kosorok,et al.  Marginal asymptotics for the “large $p$, small $n$” paradigm: With applications to microarray data , 2005, math/0508219.

[3]  Peng Xiao,et al.  Hotelling’s T 2 multivariate profiling for detecting differential expression in microarrays , 2005 .

[4]  W. Fung,et al.  Detecting differentially expressed genes by relative entropy. , 2005, Journal of theoretical biology.

[5]  A. Reverter,et al.  Joint analysis of multiple cDNA microarray studies via multivariate mixed models applied to genetic improvement of beef cattle. , 2004, Journal of animal science.

[6]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[7]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[8]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[9]  M J van der Laan,et al.  Gene expression analysis with the parametric bootstrap. , 2001, Biostatistics.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[12]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[13]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[14]  S. Portnoy Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity , 1988 .

[15]  Stephen Portnoy,et al.  A central limit theorem applicable to robust regression estimators , 1987 .

[16]  S. Portnoy On the central limit theorem in Rp when p→∞ , 1986 .

[17]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[18]  S. Portnoy Asymptotic behavior of M-estimators of p regression parameters when p , 1985 .

[19]  S. Portnoy Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency , 1984 .

[20]  A. Račkauskas,et al.  Central limit theorem in the space of sequences converging to zero , 1983 .

[21]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[22]  D. Aldous The Central Limit Theorem for Real and Banach Valued Random Variables , 1981 .

[23]  J. Kuelbs Some exponential moments of sums of independent random variables , 1978 .

[24]  V. V. Buldygin,et al.  The convergence to zero of Gaussian sequences , 1977 .

[25]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[26]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[27]  M. Okamoto Some inequalities relating to the partial sum of binomial probabilities , 1959 .

[28]  W. Feller An Introduction to Probability Theory and Its Applications , 1959 .

[29]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[30]  H. Bergström On the central limit theorem , 1944 .

[31]  O. Gaans Probability measures on metric spaces , 2022 .