Topics in high-dimensional inference

OF THE DISSERTATION Topics in High-dimensional Inference by Wenhua Jiang Dissertation Director: Professor Cun-Hui Zhang This thesis concerns three connected problems in high-dimensional inference: compound estimation of normal means, nonparametric regression and penalization method for variable selection. In the first part of the thesis, we propose a general maximum likelihood empirical Bayes (GMLEB) method for the compound estimation of normal means. We prove that under mild moment conditions on the unknown means, the GMLEB enjoys the adaptive ration optimality and adaptive minimaxity. Simulation experiments demonstrate that the GMLEB outperforms the James-Stein and several state-of-the-art threshold estimators in a wide range of settings. In the second part, we explore the GMLEB wavelet method for nonparametric regression. We show that the estimator is adaptive minimax in all Besov balls. Simulation experiments on the standard test functions demonstrate that the GMLEB outperforms several threshold estimators with moderate and large samples. Applications to high-throughput screening (HTS) data are used to show the excellent performance of the approach. In the third part, we develop a generalized penalized linear unbiased selection (GPLUS) algorithm to compute the solution paths of concave-penalized negative

[1]  T. Tony Cai,et al.  ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL , 2002 .

[2]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[3]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[4]  C. Carathéodory Über den variabilitätsbereich der fourier’schen konstanten von positiven harmonischen funktionen , 1911 .

[5]  I. Johnstone Minimax Bayes, Asymptotic Minimax and Sparse Wavelet Priors , 1994 .

[6]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[7]  Y. Meyer Wavelets and Operators , 1993 .

[8]  Harrison H. Zhou,et al.  A data-driven block thresholding approach to wavelet estimation , 2009, 0903.5147.

[9]  Thomas M. Cover,et al.  An algorithm for maximizing expected log investment return , 1984, IEEE Trans. Inf. Theory.

[10]  Cun-Hui Zhang General empirical Bayes wavelet methods and exactly adaptive minimax estimation , 2005, math/0504501.

[11]  Lawrence D. Brown,et al.  NONPARAMETRIC EMPIRICAL BAYES AND COMPOUND DECISION APPROACHES TO ESTIMATION OF A HIGH-DIMENSIONAL VECTOR OF NORMAL MEANS , 2009, 0908.1712.

[12]  Cun-Hui Zhang,et al.  Compound decision theory and empirical bayes methods , 2003 .

[13]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  L. Meier,et al.  DISCUSSION OF “ONE-STEP SPARSE ESTIMATES IN NONCONCAVE PENALIZED LIKELIHOOD MODELS (H. ZOU AND R. LI)” By Peter Bühlmann and Lukas Meier ETH Zürich , 2007 .

[16]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[17]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[18]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[19]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[20]  B. Efron,et al.  Empirical Bayes on vector observations: An extension of Stein's method , 1972 .

[21]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[22]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[23]  Ya'acov Ritov,et al.  Asymptotic e-ciency of simple decisions for the compound decision problem ⁄ , 2008, 0802.1319.

[24]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[25]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[26]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[27]  L. Brown Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems , 1971 .

[28]  W. Strawderman,et al.  Stein Estimation: The Spherical Symmetric Case , 1990 .

[29]  T. Cai Adaptive wavelet estimation : A block thresholding and oracle inequality approach , 1999 .

[30]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[31]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[32]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[33]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[34]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[35]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[36]  L. Brown In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies , 2008, 0803.3697.

[37]  H. Triebel Theory Of Function Spaces , 1983 .

[38]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[39]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .

[40]  B. Efron Robbins, Empirical Bayes, And Microarrays , 2001 .

[41]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[42]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[43]  P. Massart,et al.  Gaussian model selection , 2001 .

[44]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[45]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[46]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[47]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[48]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[49]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[50]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[51]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[52]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[53]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[54]  E. George Minimax Multiple Shrinkage Estimation , 1986 .

[55]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[56]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[57]  Cun-Hui Zhang,et al.  GENERALIZED MAXIMUM LIKELIHOOD ESTIMATION OF NORMAL MIXTURE DENSITIES , 2009 .

[58]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[59]  B. Silverman,et al.  Incorporating Information on Neighboring Coefficients Into Wavelet Estimation , 2001 .

[60]  Wenhua Jiang,et al.  General maximum likelihood empirical Bayes estimation of normal means , 2009, 0908.1709.

[61]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[62]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[63]  W. Strawderman Proper Bayes Minimax Estimators of the Multivariate Normal Mean , 1971 .

[64]  L. Wasserman All of Nonparametric Statistics , 2005 .

[65]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[66]  Y. Vardi,et al.  From image deblurring to optimal investments : maximum likelihood solutions for positive linear inverse problems , 1993 .

[67]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[68]  H. Robbins Some Thoughts on Empirical Bayes Estimation , 1983 .

[69]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[70]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[71]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[72]  Cun-Hui Zhang,et al.  Continuous Generalized Gradient Descent , 2007 .

[73]  Cun-Hui Zhang,et al.  EMPIRICAL BAYES AND COMPOUND ESTIMATION OF NORMAL MEANS , 1997 .

[74]  A. V. D. Vaart,et al.  Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.

[75]  D. L. Donoho,et al.  Ideal spacial adaptation via wavelet shrinkage , 1994 .

[76]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[77]  Cun-Hui Zhang,et al.  Empirical Bayes methods for controlling the false discovery rate with dependent data , 2007, 0708.0978.

[78]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[79]  C. Borell The Brunn-Minkowski inequality in Gauss space , 1975 .

[80]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[81]  Cun-Hui Zhang Fourier Methods for Estimating Mixing Densities and Distributions , 1990 .

[82]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[83]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .