Nonparametric Bayesian functional clustering for time-course microarray data

Time-course microarray experiments track gene expression levels across several time points. They provide valuable insights into genome-wide dynamic aspects of gene regulations. We focus on gene clustering analysis in this paper. We explore a nonparametric Bayesian method for constructing clusters in functional space from the characteristics of gene profiles. In particular, we model each gene profile using a B-spline basis. So each gene is characterized by the basis coefficients of the spline fitting. Then we place a Dirichlet process prior on the basis coefficients to determine clusters of the genes. We essentially construct a hierarchical Dirichlet processes mixing model that assigns genes into the same cluster if they share the same latent basis coefficients. A simulation study is conducted to compare the proposed method to the K-means clustering method, a model-based clustering method (MCLUST), and a two-stage version of them in terms of the adjusted Rand index. We show our new method has better adjusted Rand index number among all these methods. We apply this nonparametric Bayesian clustering method to a real data set with 6 time points to gain further insights into how genes with similar profiles are clustered together and we find their functional annotation in GeneOntology groups using GOstats.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  Simon Tavar'e,et al.  Bayesian clustering of replicated time-course gene expression data with weak signals , 2012, 1210.5029.

[3]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[4]  Ming-Hui Chen,et al.  A New Method for Tracking Configuration for Dirichlet Process Sampling , 2014 .

[5]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[6]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[7]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[8]  Robert L. Brennan,et al.  MEASURING AGREEMENT WHEN TWO OBSERVERS CLASSIFY PEOPLE INTO CATEGORIES NOT DEFINED IN ADVANCE , 1974 .

[9]  Marianna Pensky,et al.  Statistical Applications in Genetics and Molecular Biology A Bayesian Approach to Estimation and Testing in Time-course Microarray Experiments , 2011 .

[10]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[11]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[12]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[15]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[16]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[17]  L. Hubert,et al.  Comparing partitions , 1985 .

[18]  B. S. Everitt,et al.  Cluster analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[19]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[20]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[23]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[24]  Charles M. Perou,et al.  A Comparison of Gene Expression Signatures from Breast Tumors and Breast Tissue Derived Cell Lines , 2002, Disease markers.

[25]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[26]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[28]  C. Abraham,et al.  Unsupervised Curve Clustering using B‐Splines , 2003 .

[29]  P. Müller,et al.  10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[30]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[31]  B. Mallick,et al.  Functional clustering by Bayesian wavelet methods , 2006 .

[32]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[33]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[34]  David B. Dunson,et al.  Bayesian Nonparametrics: Nonparametric Bayes applications to biostatistics , 2010 .

[35]  Ho-Jin Lee,et al.  Clustering of time-course gene expression data using functional data analysis , 2007, Comput. Biol. Chem..