Large-scale inference

Apart from a more philosophical paragraph about the distinction between machine learning and statistical analysis in the final chapter, with the drawback of using neural nets and such as black-box methods (Page 185), there is relatively little coverage of nonparametric models in the book, the choice of “parametric formulations” (Page 96) being openly chosen. I can somehow understand this perspective for simpler settings, namely that nonparametric models offer little explanation of the production of the data. However, in more complex models, nonparametric components often are a convenient way to evacuate burdensome nuisance parameters. Again, technical aspects are not the focus of Principles of Applied Statistics, so this also explains why it does not dwell intently on nonparametric models.

[1]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[2]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[3]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[4]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  N. L. Johnson,et al.  Distributions in Statistics: Discrete Distributions. , 1970 .

[7]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[8]  M. Spevack,et al.  A complete and systematic concordance to the works of Shakespeare , 1968 .

[9]  R. Tibshirani,et al.  A bias correction for the minimum error rate in cross-validation , 2009, 0908.2904.

[10]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[11]  Jean-Jacques Daudin,et al.  Determination of the differentially expressed genes in microarray experiments using local FDR , 2004, BMC Bioinformatics.

[12]  W. Hoeffding The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .

[13]  E. Pitman Significance Tests Which May be Applied to Samples from Any Populations , 1937 .

[14]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[15]  Haavard Rue,et al.  Unsupervised empirical Bayesian multiple testing with external covariates , 2008, 0807.4658.

[16]  B. Efron Robbins, Empirical Bayes, And Microarrays , 2001 .

[17]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[18]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[19]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[20]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[21]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[24]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[25]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[26]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[27]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[28]  B. Efron Student's t-Test under Symmetry Conditions , 1969 .

[29]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[30]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[31]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[32]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[33]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[34]  B. Efron,et al.  Limiting the Risk of Bayes and Empirical Bayes Estimators—Part I: The Bayes Case , 1971 .

[35]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[36]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[37]  Joseph P. Romano,et al.  Control of the false discovery rate under dependence using the bootstrap and subsampling , 2008 .

[38]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[39]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[40]  B. Efron Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates , 2010, Journal of the American Statistical Association.

[41]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[42]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[43]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[44]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[46]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[47]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[48]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[49]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[50]  B. Efron SIMULTANEOUS INFERENCE : WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? , 2008, 0803.3863.

[51]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[52]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[53]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[54]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[55]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[56]  Allan Kuchinsky,et al.  Network Analysis of Human In-Stent Restenosis , 2006, Circulation.

[57]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[58]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[60]  John D. Storey A direct approach to false discovery rates , 2002 .

[61]  R. S. Singh,et al.  Empirical Bayes Estimation in Lebesgue-Exponential Families with Rates Near the best Possible Rate , 1979 .

[62]  Yoav Benjamini,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[63]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[64]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[65]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[66]  Roger E Bumgarner,et al.  Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples , 2004, Biometrics.

[67]  Per Broberg,et al.  A new estimate of the proportion unchanged genes in a microarrayexperiment , 2004, Genome Biology.

[68]  J. K. Lindsey,et al.  Construction and Comparison of Statistical Models , 1974 .

[69]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[70]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[71]  B. Efron Empirical Bayes Methods for Combining Likelihoods , 1996 .

[72]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[73]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[74]  P. Hall,et al.  Robustness of multiple testing procedures against dependence , 2009, 0903.0464.

[75]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[76]  Bradley Efron,et al.  Prediction of early progression in recently diagnosed IgA nephropathy. , 2007, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[77]  Baolin Wu,et al.  Cancer outlier differential gene expression detection. , 2007, Biostatistics.

[78]  R. Dougherty,et al.  Cross‐subject comparison of principal diffusion direction maps , 2005, Magnetic resonance in medicine.

[79]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[80]  Omkar Muralidharan,et al.  An empirical Bayes mixture method for effect size and false discovery rate estimation , 2010, 1010.1425.

[81]  H. Robbins Estimating the Total Probability of the Unobserved Outcomes of an Experiment , 1968 .

[82]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[83]  B. Efron Empirical Bayes Estimates for Large-Scale Prediction Problems , 2009, Journal of the American Statistical Association.

[84]  Omkar Muralidharan,et al.  HIGH DIMENSIONAL EXPONENTIAL FAMILY ESTIMATION VIA EMPIRICAL BAYES , 2012 .

[85]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[86]  Xihong Lin,et al.  The effect of correlation in false discovery rate estimation. , 2011, Biometrika.

[87]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[88]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[89]  B. Sorić Statistical “Discoveries” and Effect-Size Estimation , 1989 .

[90]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[91]  J. I The Design of Experiments , 1936, Nature.

[92]  Roger E Bumgarner,et al.  Cellular Gene Expression upon Human Immunodeficiency Virus Type 1 Infection of CD4+-T-Cell Lines , 2003, Journal of Virology.

[93]  Bradley Efron,et al.  Scales of Evidence for Model Selection: Fisher versus Jeffreys , 2001 .

[94]  R. Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology Pre-validation and inference in microarrays , 2011 .

[95]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[96]  L. Brown Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems , 1971 .

[97]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[98]  A. Owen Variance of the number of false discoveries , 2005 .

[99]  Jiashun Jin,et al.  Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing , 2010, 1001.1609.

[100]  B. Efron Are a set of microarrays independent of each other? , 2009, The annals of applied statistics.

[101]  Thorsten Dickhaus,et al.  Simultaneous Statistical Inference , 2014, Springer Berlin Heidelberg.

[102]  John M. MacDonald,et al.  Doubly Robust Internal Benchmarking and False Discovery Rates for Detecting Racial Bias in Police Stops , 2009 .

[103]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[104]  B. Efron,et al.  Did Shakespeare write a newly-discovered poem? , 1987 .

[105]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[106]  Joseph P. Romano,et al.  Generalizations of the familywise error rate , 2005, math/0507420.

[107]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..