Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data

Through their transcript products genes regulate the rates at which an immense variety of transcripts and subsequent proteins occur. Understanding the mechanisms that determine which genes are expressed, and when they are expressed, is one of the keys to genetic manipulation for many purposes, including the development of new treatments for disease. Viewing each gene in a genome as a distinct variable that is either on (expresses) or off (does not express), or more realistically as a continuous variable (the rate of expression), the values of some of these variables influence the values of others through the regulatory proteins they express, including, of course, the possibility that the rate of expression of a gene at one time may, in various circumstances, influence the rate of expression of that same gene at a later time. If we imagine an arrow drawn from each gene expression variable at a given time to a gene variable whose expression it influences a short while after, the result is a network, technically a directed acyclic graph (DAG). For example, the DAG in Figure 1 is a representation of a system in which the expression level of gene G1 at time 1 (denoted as G1(1)) causes the expression level of G2(2), which in turn causes the expression level of G3(3). The arrows in Figure 1 which do not have a variable at their tails are “error terms” which represent all of the causes of a variable other than the ones explicitly represented in the DAG. The DAG describes more than associations—it describes causal connections among gene expression rates. A shock to a cell—by mutation, heating, chemical treatment, etc. may alter the DAG describing the relations among gene expressions, for example by activating a gene that was otherwise not expressed, producing a cascade of new expression effects. Although “knockout” experiments (which lower a gene’s expression level) can reveal some of the underlying causal network of gene expression levels, unless guided by information from other sources, such experiments are limited in how much of the network structure they can reveal, due to the sheer number of possible combinations of experimental manipulations of genes necessary to reveal the complete causal network. Recent developments have made it possible to compare quantitatively the expression of tens of thousands of genes in cells from different sources in a single experiment, and to trace gene expression over time in thousands of genes simultaneously. cDNA microarrays are already producing extensive data, much of it available on the web. Thus there are calls for analytic software that can be applied to microarray and other data to help infer regulatory networks (Weinzierl, 1999). In this paper we will review current techniques that are available for searching for the causal relations between variables, describe algorithmic and data gathering obstacles to applying these techniques to gene expression levels, and describe the prospects for overcoming these obstacles.

[1]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[2]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Richard Scheines,et al.  Causation, Prediction, and Search, Second Edition , 2000, Adaptive computation and machine learning.

[5]  Roland Somogyi,et al.  Modeling the complexity of genetic networks: Understanding multigenic and pleiotropic regulation , 1996, Complex..

[6]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[7]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[8]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[9]  Arantxa Etxeverria The Origins of Order , 1993 .

[10]  Mtw,et al.  Computation, causation, and discovery , 2000 .

[11]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[12]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[13]  Trevor Hastie,et al.  Gene Shaving: a new class of clustering methods for expression arrays , 2000 .

[14]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[15]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[16]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[19]  Robert O J Weinzierl Mechanisms of Gene Expression: Structure, Function and Evolution of the Basal Transcriptional Machinery , 1999 .