Screening for Differential Gene Expressions from Microarray Data

Living organisms need proteins to provide structure, such as skin and bone, and to provide function to the organism through, for example, hormones and enzymes. Genes are translated to proteins after first being transcribed to messenger RNA. Even though every cell of an organism contains the full set of genes for that organism, only a small set of the genes is functional in each cell. The levels at which the different genes are functional in various cell types (their expression levels) can all be screened simultaneously using microarrays. The design of two-channel microarray experiments is discussed and ideas are illustrated through the analysis of data from a designed microarray experiment on gene expression using liver and muscle tissue. The number of genes screened in a microarray experiment can be in the thousands or tens of thousands. So it is important to adjust for the multiplicity of comparisons of gene expression levels because, otherwise, the more genes that are screened, the more likely incorrect statistical inferences are to occur. Different purposes of gene expression experiments may call for different control of multiple comparison error rates. We illustrate how control of the statistical error rate translates into control of the rate of incorrect biological decisions. We discuss the pros and cons of two forms of multiple comparisons inference: testing for significant difference and providing confidence bounds. Two multiple testing principles are described: closed testing and partitioning. Stepdown testing, a popular form of gene expression analysis, is shown to be a shortcut to closed and partitioning testing. We give a set of conditions for such a shortcut to be valid.

[1]  C. R. Rao,et al.  Linear Statistical Inference and its Applications , 1968 .

[2]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[3]  Jason C. Hsu,et al.  On Confidence Sets in Multiple Comparisons , 1988 .

[4]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[5]  Rudolf Beran,et al.  Balanced Simultaneous Confidence Sets , 1988 .

[6]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[7]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[8]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[9]  J. Hsu Multiple Comparisons: Theory and Methods , 1996 .

[10]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[12]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[15]  F. Speleman,et al.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes , 2002, Genome Biology.

[16]  G. Churchill,et al.  Sex, flies and microarrays , 2001, Nature Genetics.

[17]  H. Finner,et al.  On the False Discovery Rate and Expected Type I Errors , 2001 .

[18]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[19]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[20]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[21]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[22]  K. Strassburger,et al.  The partitioning principle: a powerful tool in multiple decision theory , 2002 .

[23]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[24]  B. Alberts,et al.  Molecular Biology of the Cell 4th edition , 2007 .

[25]  Paola Sebastiani,et al.  Statistical Challenges in Functional Genomics , 2003 .

[26]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[27]  Jane Y. Chang,et al.  Simultaneous confidence intervals for differential gene expressions , 2006 .

[28]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2007, Genetical research.