Stochastic Dynamic Modeling of Short Gene Expression Time-Series Data

In this paper, the expectation maximization (EM) algorithm is applied for modeling the gene regulatory network from gene time-series data. The gene regulatory network is viewed as a stochastic dynamic model, which consists of the noisy gene measurement from microarray and the gene regulation first-order autoregressive (AR) stochastic dynamic process. By using the EM algorithm, both the model parameters and the actual values of the gene expression levels can be identified simultaneously. Moreover, the algorithm can deal with the sparse parameter identification and the noisy data in an efficient way. It is also shown that the EM algorithm can handle the microarray gene expression data with large number of variables but a small number of observations. The gene expression stochastic dynamic models for four real-world gene expression data sets are constructed to demonstrate the advantages of the introduced algorithm. Several indices are proposed to evaluate the models of inferred gene regulatory networks, and the relevant biological properties are discussed.

[1]  Ilan Ziskind,et al.  Maximum-likelihood localization of narrow-band autoregressive sources via the EM algorithm , 1993, IEEE Trans. Signal Process..

[2]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[3]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Satoru Miyano,et al.  Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations , 2002, Pacific Symposium on Biocomputing.

[5]  A. Arkin,et al.  Stochastic mechanisms in gene expression. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..

[7]  Allan Tucker,et al.  A framework for modelling virus gene expression data , 2002, Intell. Data Anal..

[8]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[9]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[10]  Ankush Mittal,et al.  Model gene network by semi-fixed Bayesian network , 2006, Expert Syst. Appl..

[11]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[12]  T. Kepler,et al.  Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. , 2001, Biophysical journal.

[13]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[14]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[15]  Fang-Xiang Wu,et al.  Modeling Gene Expression from Microarray Expression Data with State-Space Equations , 2003, Pacific Symposium on Biocomputing.

[16]  M. Thattai,et al.  Stochastic Gene Expression in Fluctuating Environments , 2004, Genetics.

[17]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[20]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[21]  P. Diggle Time Series: A Biostatistical Introduction , 1990 .

[22]  S. Tapscott,et al.  Modeling stochastic gene expression: implications for haploinsufficiency. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Marcel J. T. Reinders,et al.  A Comparison of Genetic Network Models , 2000, Pacific Symposium on Biocomputing.

[24]  Xiaohui Liu,et al.  A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis , 2001, IDA.

[25]  D. A. Baxter,et al.  Mathematical Modeling of Gene Networks , 2000, Neuron.

[26]  Ehud Weinstein,et al.  Iterative and sequential algorithms for multisensor signal enhancement , 1994, IEEE Trans. Signal Process..

[27]  Tianhai Tian,et al.  Stochastic neural network models for gene regulatory networks , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[28]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[30]  Roland Somogyi,et al.  Modeling the complexity of genetic networks: Understanding multigenic and pleiotropic regulation , 1996, Complex..

[31]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[34]  Sui Huang Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery , 1999, Journal of Molecular Medicine.

[35]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[36]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[38]  Morris F. Maduro,et al.  Making worm guts: the gene regulatory network of the Caenorhabditis elegans endoderm. , 2002, Developmental biology.

[39]  Xiaohui Liu,et al.  Predicting glaucomatous visual field deterioration through short multivariate time series modelling , 2002, Artif. Intell. Medicine.

[40]  Chris Chatfield,et al.  The Analysis of Time Series , 1990 .

[41]  Donna K Slonim,et al.  The homeodomain protein PAL-1 specifies a lineage-specific regulatory network in the C. elegans embryo , 2005, Development.

[42]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[43]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[44]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.