A mixture model approach to detecting differentially expressed genes with microarray data

An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.

[1]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[2]  Stephen M. Hewitt,et al.  Post-analysis follow-up and validation of microarray experiments , 2002, Nature Genetics.

[3]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[4]  G. Govaert,et al.  Choosing models in model-based clustering and discriminant analysis , 1999 .

[5]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[6]  Marc S Halfon,et al.  Exploring genetic regulatory networks in metazoan development: methods and models. , 2002, Physiological genomics.

[7]  Ruben Abagyan,et al.  Match-Only Integral Distribution (MOID) Algorithm for high-density oligonucleotide array analysis , 2002, BMC Bioinformatics.

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  Wei Pan,et al.  Modified Nonparametric Approaches to Detecting Differentially Expressed Genes in Replicated Microarray Experiments , 2003, Bioinform..

[10]  Wei Pan,et al.  Statistical significance analysis of longitudinal gene expression data , 2003, Bioinform..

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[13]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[14]  W. Pan,et al.  Analysis by cDNA microarrays of altered gene expression in middle ears of rats following pneumococcal infection. , 2002, International journal of pediatric otorhinolaryngology.

[15]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[16]  Felix Naef,et al.  A study of accuracy, precision in oligonucleotide arrays: extracting more signal at large concentrations , 2002, Bioinform..

[17]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[18]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[19]  G A Whitmore,et al.  Models for microarray gene expression data , 2002, Journal of biopharmaceutical statistics.

[20]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[21]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[22]  Charles Kooperberg,et al.  Evaluating test statistics to select interesting genes in microarray experiments. , 2002, Human molecular genetics.

[23]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[24]  Xiaohong Huang,et al.  Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays , 2002, Functional & Integrative Genomics.

[25]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[26]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[27]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[28]  E. Lander Array of hope , 1999, Nature Genetics.

[29]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[30]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[31]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[32]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[33]  Charles Kooperberg,et al.  Estimating the statistical significance of gene expression changes observed with oligonucleotide arrays. , 2002, Human molecular genetics.

[34]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[35]  Raymond J Carroll,et al.  DNA Microarray Experiments: Biological and Technological Aspects , 2002, Biometrics.

[36]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[37]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[38]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[39]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Brian S. Yandell,et al.  Mining for Low-abundance Transcripts in Microarray Data , 2001 .

[41]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[42]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[43]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[44]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis : A Survey , 2002 .

[45]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[46]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[47]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[48]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[49]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[50]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[51]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[52]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[53]  Kevin R. Coombes,et al.  Identifying Differentially Expressed Genes in cDNA Microarray Experiments , 2001, J. Comput. Biol..

[54]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[55]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[56]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[57]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[58]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[59]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[60]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002, Annals of the New York Academy of Sciences.

[61]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[62]  Hongzhe Li,et al.  Statistical methods for analysis of time course gene expression data. , 2002, Frontiers in bioscience : a journal and virtual library.

[63]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.