Model Selection and Multiple Testing - A Bayesian and Empirical Bayes Overview and some New Results

We provide a brief overview of both Bayes and classical model selection. We argue tentatively that model selection has at least two major goals, that of finding the correct model or predicting well, and that in general both these goals may not be achieved in an optimum manner by a single model selection rule. We discuss, briefly but critically, through a study of well-known model selection rules like AIC, BIC, DIC and Lasso, how these different goals are pursued in each paradigm. We introduce some new definitions of consistency, results and conjectures about consistency in high dimensional model selection problems. Finally we discuss some new or recent results in Full Bayes and Empirical Bayes multiple testing, and cross-validation. We show that when the number of parameters tends to infinity at a smaller rate than sample size, then it is best from the point of view of consistency to use most of the data for inference and only a negligible proportion to make an improper prior proper.

[1]  L. M. M.-T. Theory of Probability , 1929, Nature.

[2]  J. Hammersley On Estimating Restricted Parameters , 1950 .

[3]  C. Kraft Some conditions for consistency and uniform consistency of statistical procedures , 1955 .

[4]  J. L. Hodges,et al.  The Efficiency of Some Nonparametric Competitors of the t-Test , 1956 .

[5]  D. Cox Tests of Separate Families of Hypotheses , 1961 .

[6]  David R. Cox,et al.  Further Results on Tests of Separate Families of Hypotheses , 1962 .

[7]  P. Seeger A Note on a Method for the Analysis of Significances en masse , 1968 .

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  Inference about separated families in large samples , 1975 .

[10]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[11]  M. Abercrombie,et al.  The Penguin dictionary of biology , 2004 .

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  J. Čatský,et al.  The Penguin Dictionary of Biology , 1981, Biologia Plantarum.

[14]  Philip E. Gill,et al.  Practical optimization , 1981 .

[15]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[16]  R. Shibata Approximate efficiency of a selection procedure for the number of regression variables , 1984 .

[17]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[18]  David A. Freedman,et al.  Rejoinder: On the Consistency of Bayes Estimates , 1986 .

[19]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[20]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[21]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[22]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[23]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[24]  A. O'Hagan,et al.  Fractional Bayes factors for model comparison , 1995 .

[25]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  J. Berger,et al.  The Intrinsic Bayes Factor for Model Selection and Prediction , 1996 .

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[30]  D. Pauler The Schwarz criterion and related methods for normal linear models , 1998 .

[31]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[32]  M. Bhaskara Rao,et al.  Model Selection and Inference , 2000, Technometrics.

[33]  N. Mukhopadhyay Bayesian model selection for high -dimensional models with prediction error loss and 0–1 loss , 2000 .

[34]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[35]  Gregory R. Grant,et al.  Statistical Methods in Bioinformatics , 2001 .

[36]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[37]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[38]  J. Ghosh,et al.  Nonsubjective Bayes testing—an overview , 2002 .

[39]  J. Ghosh,et al.  Approximations and consistency of Bayes factors as model dimension grows , 2003 .

[40]  Shyamal D. Peddada,et al.  Gene Selection and Clustering for Time-course and Dose-response Microarray Experiments Using Order-restricted Inference , 2003, Bioinform..

[41]  J. Ghosh,et al.  Parametric empirical Bayes model selection---some theory, methods and simulation , 2003 .

[42]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[43]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[44]  J. Berger,et al.  Training samples in objective Bayesian model selection , 2004, math/0406460.

[45]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[46]  Model selection for high dimensional problems with application to function estimation , 2004 .

[47]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[48]  J. Ghosh,et al.  Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci , 2004, Genetics.

[49]  Jaeyong Lee,et al.  A note on the consistency of Bayes factors for testing point null versus non-parametric alternatives , 2004 .

[50]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[51]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[52]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[53]  Susmita Datta,et al.  Empirical Bayes screening of many p-values with applications to microarray studies , 2005, Bioinform..

[54]  Berwin A. Turlach,et al.  On algorithms for solving least squares problems under an L1 penalty or an L1 constraint , 2005 .

[55]  J. Ghosh,et al.  Some Bayesian predictive approaches to model selection , 2005 .

[56]  M. Wegkamp,et al.  Consistent variable selection in high dimensional regression via multiple testing , 2006 .

[57]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[58]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[59]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[60]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[61]  J. Ghosh,et al.  An Introduction to Bayesian Analysis: Theory and Methods , 2006 .

[62]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[63]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[64]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[65]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[66]  Jayanta K. Ghosh,et al.  On the Empirical Bayes approach to the problem of multiple testing , 2007, Qual. Reliab. Eng. Int..

[67]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[68]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[69]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[70]  J. Ghosh,et al.  Extending the Modified Bayesian Information Criterion (mBIC) to Dense Markers and Multiple Interval Mapping , 2008, Biometrics.

[71]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[72]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[73]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[74]  Jayanta K. Ghosh,et al.  Selecting explanatory variables with the modified version of the Bayesian information criterion , 2008, Qual. Reliab. Eng. Int..

[75]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[76]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[77]  R. Ramamoorthi,et al.  Remarks on consistency of posterior distributions , 2008, 0805.3248.

[78]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[79]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[80]  David Draper,et al.  Calibration results for Bayesian model specication , 2010 .

[81]  Malgorzata Bogdan,et al.  Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative , 2010, 1005.4753.

[82]  G. Casella,et al.  CONSISTENCY OF OBJECTIVE BAYES FACTORS AS THE MODEL DIMENSION GROWS , 2010, 1010.3821.

[83]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[84]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[85]  J. Ghosh,et al.  Bayes Model Selection with Path Sampling: Factor Models and Other Examples , 2010, 1008.4373.

[86]  Wenge Guo,et al.  Controlling False Discoveries in Multidimensional Directional Decisions, with Applications to Gene Expression Data on Ordered Categories , 2010, Biometrics.

[87]  Jiashun Jin,et al.  Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing , 2010, 1001.1609.

[88]  J. Ghosh,et al.  The Bayes oracle and asymptotic optimality of multiple testing procedures under sparsity , 2010 .

[89]  M. Bogdan,et al.  A model selection approach to genome wide association studies , 2010, 1010.0124.

[90]  J. Ghosh,et al.  AIC, BIC and Recent Advances in Model Selection , 2011 .

[91]  Malgorzata Bogdan,et al.  Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models , 2011, Comput. Stat. Data Anal..

[92]  Author D. R. Cox Further Results on Tests of Separate Families of Hypotheses , 2017 .