Bayesian State Space Models for Inferring and Predicting Temporal Gene Expression Profiles

Prediction of gene dynamic behavior is a challenging and important problem in genomic research while estimating the temporal correlations and non-stationarity are the keys in this process. Unfortunately, most existing techniques used for the inclusion of the temporal correlations treat the time course as evenly distributed time intervals and use stationary models with time-invariant settings. This is an assumption that is often violated in microarray time course data since the time course expression data are at unequal time points, where the difference in sampling times varies from minutes to days. Furthermore, the unevenly spaced short time courses with sudden changes make the prediction of genetic dynamics difficult. In this paper, we develop two types of Bayesian state space models to tackle this challenge for inferring and predicting the gene expression profiles associated with diseases. In the univariate time-varying Bayesian state space models we treat both the stochastic transition matrix and the observation matrix time-variant with linear setting and point out that this can easily be extended to nonlinear setting. In the multivariate Bayesian state space model we include temporal correlation structures in the covariance matrix estimations. In both models, the unevenly spaced short time courses with unseen time points are treated as hidden state variables. Bayesian approaches with various prior and hyper-prior models with MCMC algorithms are used to estimate the model parameters and hidden variables. We apply our models to multiple tissue polygenetic affymetrix data sets. Results show that the predictions of the genomic dynamic behavior can be well captured by the proposed models.

[1]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[2]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[3]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[4]  Arpad Kelemen,et al.  Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments , 2005, Functional & Integrative Genomics.

[5]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[6]  Jin Y. Jin,et al.  Modeling of Corticosteroid Pharmacogenomics in Rat Liver Using Gene Microarrays , 2003, Journal of Pharmacology and Experimental Therapeutics.

[7]  Arpad Kelemen,et al.  Differential and trajectory methods for time course gene expression data , 2005, Bioinform..

[8]  Peter Congdon,et al.  Applied Bayesian Modelling , 2003 .

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[11]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[12]  Debra C DuBois,et al.  The genomic response of skeletal muscle to methylprednisolone using microarrays: tailoring data mining to the structure of the pharmacogenomic time series. , 2004, Pharmacogenomics.

[13]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[14]  Michael A. West,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1997, J. Oper. Res. Soc..

[15]  Siem Jan Koopman,et al.  Time Series Analysis of Non-Gaussian Observations Based on State Space Models from Both Classical and Bayesian Perspectives , 1999 .

[16]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[17]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[18]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[22]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..

[23]  J. Monod,et al.  Genetic regulatory mechanisms in the synthesis of proteins. , 1961, Journal of molecular biology.

[24]  Peter Congdon Bayesian statistical modelling , 2002 .

[25]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.