Bayesian model-based clustering of temporal gene expression using autoregressive panel data approach

MOTIVATION In a microarray time series analysis, due to the large number of genes evaluated, the first step toward understanding the complex time network is the clustering of genes that share similar expression patterns over time. Up until now, the proposed methods do not point simultaneously to the temporal autocorrelation of the gene expression and the model-based clustering. We present a Bayesian method that considers jointly the fit of autoregressive panel data models and hierarchical gene clustering. RESULTS The proposed methodology was able to cluster genes that share similar expression over time, which was determined jointly by the estimates of autoregression parameters, by the average level of expression) and by the quality of the fitted model. AVAILABILITY AND IMPLEMENTATION The R codes for implementation of the proposed clustering method and for simulation study, as well as the real and simulated datasets, are freely accessible on the Web http://www.det.ufv.br/~moyses/links.php. CONTACT moysesnascim@ufv.br.

[1]  John Quackenbush,et al.  Defining an informativeness metric for clustering gene expression data , 2011, Bioinform..

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[4]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[5]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[6]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[8]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[9]  R. Mojena,et al.  Hierarchical Grouping Methods and Stopping Rules: An Evaluation , 1977, Comput. J..

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Guy N. Brock,et al.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles , 2011, Bioinform..

[12]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[13]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[14]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[15]  Lon-Mu Liu,et al.  Random coefficient first-order autoregressive models , 1980 .

[16]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[17]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[18]  Brian J. Smith,et al.  boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference , 2007 .