Publishes Results of a Wide Variety of Studies from Human and from Informative Model Systems with Physiological Genomics

Gene expression microarrays have been the vanguard of new analytic approaches in high-dimensional biology. Draft sequences of several genomes coupled with new technologies allow study of the influences and responses of entire genomes rather than isolated genes. This has opened a new realm of highly dimensional biology where questions involve multiplicity at unprecedented scales: thousands of genetic polymorphisms, gene expression levels, protein measurements, genetic sequences, or any combination of these and their interactions. Such situations demand creative approaches to the processes of inference, estimation, prediction, classification, and study design. Although bench scientists intuitively grasp the need for flexibility in the inferential process, the elaboration of formal supporting statistical frameworks is just at the very start. Here, we will discuss some of the unique statistical challenges facing investigators studying high-dimensional biology, describe some approaches being developed by statistical scientists, and offer an epistemological framework for the validation of proffered statistical procedures. A key theme will be the challenge in providing methods that a statistician judges to be sound and a biologist finds informative. The shift from family-wise error rate control to false discovery rate estimation and to assessment of ranking and other forms of stability will be portrayed as illustrative of approaches to this challenge.

[1]  E. G. Sada Intersection-union Tests in Dissolution Proole Testing , 1999 .

[2]  K. Krishnan,et al.  Response to Correspondence: Loss-of-Function Mutation in Tryptophan Hydroxylase-2 Identified in Unipolar Major Depression , 2005, Neuron.

[3]  John W. Pratt,et al.  Bayesian Interpretation of Standard Inference Statements , 1965 .

[4]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[5]  David B. Allison,et al.  Power analysis and sample size estimation in the age of high dimensional biology: a parametric bootstrap approach illustrated via microarray Research , 2004 .

[6]  Raya Khanin,et al.  Near‐optimal designs for dual channel microarray studies , 2005 .

[7]  Richard Weindruch,et al.  A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. , 2003, Nutrition.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  D. Allison,et al.  Epistemological Foundations of Statistical Methods for High-Dimensional Biology∗ , 2005 .

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[12]  David B. Allison,et al.  Randomization tests for small samples: an application for genetic expression data , 2003 .

[13]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[14]  Lei Zhou,et al.  An expression index for Affymetrix GeneChips based on the generalized logarithm , 2005, Bioinform..

[15]  M. Radmacher,et al.  Design of studies using DNA microarrays , 2002, Genetic epidemiology.

[16]  Valen E. Johnson A Bayesian x2 test for goodness-of-fit , 2004 .

[17]  K R Hess,et al.  Clinical trial design for microarray predictive marker discovery and assessment. , 2004, Annals of oncology : official journal of the European Society for Medical Oncology.

[18]  Juan P. Steibel,et al.  Reassessing Design and Analysis of two-Colour Microarray Experiments Using Mixed Effects Models , 2005, Comparative and functional genomics.

[19]  L. Toothaker Multiple Comparisons for Researchers , 1991 .

[20]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[21]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[22]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[23]  F. McMahon,et al.  Response to Zhang et al., (2005) Loss-of-Function Mutation in Tryptophan Hydroxylase-2 Identified in Unipolar Major Depression. Neuron 45, 11–16 , 2005, Neuron.

[24]  D. Goodin The cambridge dictionary of statistics , 1999 .

[25]  Stan Pounds,et al.  Estimation and control of multiple testing error rates for microarray studies , 2006, Briefings Bioinform..

[26]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[27]  C. Broeckhoven,et al.  Response to Zhang et al. (2005) Loss-of-Function Mutation in Tryptophan Hydroxylase-2 Identified in Unipolar Major Depression. Neuron 45, 11–16 , 2005, Neuron.

[28]  E. Laska,et al.  Testing whether an identified treatment is best. , 1989, Biometrics.

[29]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[30]  S. C. Lakhotia,et al.  What is a gene? , 1997 .

[31]  Jeanne Kowalski,et al.  Non-parametric, hypothesis-based analysis of microarrays for comparison of several phenotypes , 2004, Bioinform..

[32]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[33]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[34]  R. Cattell,et al.  A general plasmode (No. 30-10-5-2) for factor analytic exercises and research. , 1967 .

[35]  M. Katoh WNT2B: comparative integromics and clinical applications (Review). , 2005, International journal of molecular medicine.

[36]  R. McIndoe,et al.  Microarray experimental design: power and sample size considerations. , 2003, Physiological genomics.

[37]  David B Allison,et al.  Applications of Bayesian Statistical Methods in Microarray Data Analysis , 2004, American journal of pharmacogenomics : genomics-related research in drug development and clinical practice.

[38]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[39]  D. Lykken Statistical significance in psychological research. , 1968, Psychological bulletin.

[40]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[41]  T P Speed,et al.  Experimental design and low-level analysis of microarray data. , 2004, International review of neurobiology.

[42]  Peter E. Kennedy,et al.  Randomization tests for multiple regression , 1996 .

[43]  Andrei Yakovlev,et al.  Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk? , 2006, Statistical applications in genetics and molecular biology.

[44]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[45]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[46]  Mark C. K. Yang,et al.  An improved procedure for gene selection from microarray experiments using false discovery rate criterion , 2006, BMC Bioinformatics.

[47]  M. J. van der Laan,et al.  Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives , 2004, Statistical applications in genetics and molecular biology.

[48]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[49]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[50]  Rudolph S. Parrish,et al.  BMC Bioinformatics BioMed Central Research article Sources of variation in Affymetrix microarray experiments , 2005 .

[51]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[52]  Wei Pan,et al.  Gene expression A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data , 2005 .

[53]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[54]  Cheng Cheng,et al.  Improving false discovery rate estimation , 2004, Bioinform..

[55]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[56]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[57]  Jianqing Fan,et al.  Removing intensity effects and identifying significant genes for Affymetrix arrays in macrophage migration inhibitory factor-suppressed neuroblastoma cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[59]  Woncheol Jang,et al.  How accurately can we control the FDR in analyzing microarray data? , 2006, Bioinform..

[60]  Fazel Famili,et al.  Evaluation and optimization of clustering in gene expression data analysis , 2004, Bioinform..

[61]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[62]  Hemant Ishwaran,et al.  BAMarray™: Java software for Bayesian analysis of variance for microarray data , 2006, BMC Bioinformatics.

[63]  F. Hu,et al.  A Common Genetic Variant Is Associated with Adult and Childhood Obesity , 2006, Science.

[64]  V. Bohr,et al.  Gene expression profiling in Werner syndrome closely resembles that of normal aging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[66]  Richard J. Fox,et al.  A two-sample Bayesian t-test for microarray data , 2006, BMC Bioinformatics.

[67]  H. Wulff,et al.  What do doctors know about statistics? , 1987, Statistics in medicine.

[68]  Eric P. Hoffman,et al.  An interactive power analysis tool for microarray hypothesis testing and generation , 2006, Bioinform..

[69]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[70]  Susan R. Wilson,et al.  Two guidelines for bootstrap hypothesis testing , 1991 .

[71]  T. Venkatesh,et al.  Integromics: challenges in data integration , 2002, Genome Biology.

[72]  John D. Storey,et al.  Lymphocyte Anergy in Patients with Carcinoma , 1973, British Journal of Cancer.

[73]  Ernst Wit,et al.  Statistics for Microarrays : Design, Analysis and Inference , 2004 .

[74]  David B. Allison,et al.  Power and sample size estimation in high dimensional biology , 2004 .

[75]  John D. Storey A direct approach to false discovery rates , 2002 .

[76]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[77]  R. Berger,et al.  Bioequivalence trials, intersection-union tests and equivalence confidence sets , 1996 .

[78]  Joanna H. Shih,et al.  Statistical Issues in the Design and Analysis of Gene Expression Microarray Studies of Animal Models , 2003, Journal of Mammary Gland Biology and Neoplasia.

[79]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[80]  David B. Allison,et al.  DNA Microarrays and Related Genomics Techniques : Design, Analysis, and Interpretation of Experiments , 2005 .

[81]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[82]  Helen Pearson,et al.  Genetics: What is a gene? , 2006, Nature.

[83]  D. O'Kane,et al.  Gene expression microarrays: a 21st century tool for directed vaccine design. , 2001, Vaccine.

[84]  Richard Simon,et al.  Questions and answers on design of dual-label microarrays for identifying differentially expressed genes. , 2003, Journal of the National Cancer Institute.

[85]  N. Laird,et al.  Family-based designs in the age of large-scale gene-association studies , 2006, Nature Reviews Genetics.

[86]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[87]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[88]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[89]  Per Broberg,et al.  A comparative review of estimates of the proportion unchanged genes and the false discovery rate , 2005, BMC Bioinformatics.