A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments

MOTIVATION Multiclass response (MCR) experiments are those in which there are more than two classes to be compared. In these experiments, though the null hypothesis is simple, there are typically many patterns of gene expression changes across the different classes that led to complex alternatives. In this paper, we propose a new strategy for selecting genes in MCR that is based on a flexible mixture model for the marginal distribution of a modified F-statistic. Using this model, false positive and negative discovery rates can be estimated and combined to produce a rule for selecting a subset of genes. Moreover, the method proposed allows calculation of these rates for any predefined subset of genes. RESULTS We illustrate the performance our approach using simulated datasets and a real breast cancer microarray dataset. In this latter study, we investigate predefined subset of genes and point out interesting differences between three distinct biological pathways. AVAILABILITY http://www.bgx.org.uk/software.html

[1]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[2]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[4]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[5]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[6]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[10]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[11]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[13]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[14]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[15]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[16]  John D. Storey A direct approach to false discovery rates , 2002 .

[17]  Mark Schena,et al.  Microarray Biochip Technology , 2000 .

[18]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[19]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[20]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[21]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[22]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[23]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[24]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[25]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .