Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data

DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions. Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.

[1]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[2]  Wei Pan,et al.  Combining gene annotations and gene expression data in model-based clustering: weighted method. , 2006, Omics : a journal of integrative biology.

[3]  Cavan S Reilly,et al.  A Method for Normalizing Microarrays Using Genes That Are Not Differentially Expressed , 2003 .

[4]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[5]  David M. Rocke,et al.  Transformation and normalization of oligonucleotide microarray data , 2003, Bioinform..

[6]  Geoffrey J. McLachlan,et al.  Using mixture models to detect differentially expressed genes , 2005 .

[7]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[8]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[9]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[10]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[11]  Adrian E. Raftery,et al.  Normal uniform mixture differential gene expression detection for cDNA microarrays , 2005, BMC Bioinformatics.

[12]  Alex Lewin,et al.  A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments , 2004, Bioinform..

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  F. Graybill,et al.  Matrices with Applications in Statistics. , 1984 .

[15]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[16]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[17]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[18]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[19]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[20]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[21]  David M. Rocke,et al.  Variance-stabilizing transformations for two-color microarrays , 2004, Bioinform..

[22]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[23]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[24]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[25]  Karen Kafadar,et al.  Transformations, background estimation, and process effects in the statistical analysis of microarrays , 2003, Comput. Stat. Data Anal..

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[29]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[33]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[34]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[35]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .