A framework for modelling virus gene expression data

Short, high-dimensional, Multivariate Time Series (MTS) data are common in many fields such as medicine, finance and science, and any advance in modelling this kind of data would be beneficial. Nowhere is this truer than functional genomics where effective ways of analysing gene expression data are urgently needed. Progress in this area could help obtain a “global” view of biological processes, and ultimately lead to a great improvement in the quality of human life. We present a computational framework for modelling this type of data, and report experimental results of applying this framework to the analysis of gene expression data in the virology domain. The framework contains a three-step modelling strategy: correlation search, variable grouping, and short MTS modelling. Novel research is involved in each step which has been individually tested on different real-world datasets in engineering and medicine. This is the first attempt to integrate all these components into a coherent computational framework, and test the framework on a very challenging application area, producing promising results.

[1]  Xiaohui Liu,et al.  Evolutionary Computation to Search for Strongly Correlated Variables in High-Dimensional Time-Series , 1999, IDA.

[2]  Paul Kellam,et al.  Kaposi's Sarcoma-Associated Herpesvirus Latent and Lytic Gene Expression as Revealed by DNA Arrays , 2001, Journal of Virology.

[3]  Martin Casdagli,et al.  Nonlinear Modeling And Forecasting , 1992 .

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine-mediated learning.

[6]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[7]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[8]  Xiaohui Liu,et al.  Evolutionary learning of dynamic probabilistic models with large time lags , 2001, Int. J. Intell. Syst..

[9]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[10]  W. J. Langford Statistical Methods , 1959, Nature.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[13]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[14]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[15]  Hans-Paul Schwefel,et al.  Evolutionary Programming and Evolution Strategies: Similarities and Differences , 1993 .

[16]  Xiaohui Liu,et al.  Predicting glaucomatous visual field deterioration through short multivariate time series modelling , 2002, Artif. Intell. Medicine.

[17]  Eric Horvitz,et al.  Uncertain reasoning and forecasting , 1995 .

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[20]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[21]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .