Including probe-level uncertainty in model-based gene expression clustering

BackgroundClustering is an important analysis performed on microarray gene expression data since it groups genes which have similar expression patterns and enables the exploration of unknown gene functions. Microarray experiments are associated with many sources of experimental and biological variation and the resulting gene expression data are therefore very noisy. Many heuristic and model-based clustering approaches have been developed to cluster this noisy data. However, few of them include consideration of probe-level measurement error which provides rich information about technical variability.ResultsWe augment a standard model-based clustering method to incorporate probe-level measurement error. Using probe-level measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we include the probe-level measurement error directly into the standard Gaussian mixture model. Our augmented model is shown to provide improved clustering performance on simulated datasets and a real mouse time-course dataset.ConclusionThe performance of model-based clustering of gene expression data is improved by including probe-level measurement error and more biologically meaningful clustering results are obtained.

[1]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[2]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[3]  Neil D. Lawrence,et al.  A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips , 2005, Bioinform..

[4]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[5]  Neil D. Lawrence,et al.  Accounting for probe-level noise in principal component analysis of microarray data , 2005, Bioinform..

[6]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[7]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[8]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[9]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[10]  Roger E Bumgarner,et al.  Clustering gene-expression data with repeated measurements , 2003, Genome Biology.

[11]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[12]  Neil D. Lawrence,et al.  Propagating uncertainty in microarray data analysis , 2006, Briefings Bioinform..

[13]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[14]  Padhraic Smyth,et al.  Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance , 2004, Proc. Natl. Acad. Sci. USA.

[15]  Peter Spellucci,et al.  An SQP method for general nonlinear programs using only equality constrained subproblems , 1998, Math. Program..

[16]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[17]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[18]  Francisco Azuaje,et al.  Cluster validation techniques for genome expression data , 2003, Signal Process..

[19]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[20]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[21]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[22]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Neil D. Lawrence,et al.  Probe-level measurement error improves accuracy in detecting differential gene expression , 2006, Bioinform..

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[29]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis , 2002 .

[30]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[31]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[32]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[33]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.