Reconstructing gene regulation networks from passive observations and active interventions

A Bayesian network is a graph-based representation of a joint probability distribution that captures properties of conditional independence between variables. This representation consists of two components. The first component is a directed acyclic graph (DAG), where the nodes represent genes and arrows between nodes indicate that one gene directly regulates the expression of another gene. The second component describes a conditional distribution for each node given its parents in the graph. Basically, there are two possible ways to recover the structure of a bayesian network: either passively observing the underlying network or actively perturbing it and analyzing the effects of setting some nodes to fixed values by human intervention. For both approaches biological data is easily obtainable. Microarray experiments provide a snapshot of the activity of several thousands genes simultaneously. Gene perturbation as a method to identify regulation pathways has a long tradition in biology; evaluating the effects of interventions in knock-out or RNAi experiments is by now a standard procedure. The methods to build Bayesian networks from observational data can be divided into two classes: methods that use a scoring function to evaluate how well the network matches the data [1, 2], and methods that perform tests for conditional independence on the observations [4, 5]. The biological interpretation of the graphs produced by these methods is hindered by the fact that the representation of a joint distribution in a bayesian network is not unique. Many different networks with ambiguous edges can represent the same joint distribution. They indicate totally different gene regulation pathways but are statistically equivalent: Even with infinitely many data we can not decide between them. Learning an equivalence class of networks is how far we get by only depending on passive observations. To further resolve the structure we need information about the effect of interventions. This determines the directions of the edges between the perturbed node and its neighbors [6].