The Computational and Experimental Complexity of Gene Perturbations for Regulatory Network Search

Our primary interest is in determining how many gene perturbation experiments are required to determine the network of regulatory relations among any given set of genes, ignoring questions of uncertainty in statistical decisions. Secondarily, we are interested in whether it is feasible to compute which further experiments will be most informative. And, finally, since the sample sizes (number of expression measurements per gene) in such experiments are typically small, we are concerned with the stability of statistical decisions about differential expression given different statistical tests. Various algorithms have been proposed for learning (partial) genetic regulatory networks through systematic measurements of differential expression in wild type versus strains in which expression of specific genes has been suppressed or enhanced, as well as for determining the most informative next experiment in a sequence. While the behavior of these algorithms has been investigated for toy examples, the full computational complexity of the problem has not received sufficient attention. We show that finding the true regulatory network requires (in the worst-case) exponentially many experiments (in the number of genes). Perhaps more importantly, we provide an algorithm for determining the set of regulatory networks consistent with the observed data. We then show that this algorithm is infeasible for realistic data (specifically, nine genes and ten experiments). This infeasibility is not due to an algorithmic flaw, but rather to the fact that there are far too many networks consistent with the data (10 in the provided example). We conclude that gene perturbation experiments are useful in confirming regulatory network models discovered by other techniques, but not a feasible search strategy. The answers we find are far more pessimistic than has previously been suggested in the literature (e.g., [Onami et al., 2001; Ideker et al., 2000]). We show that, while perturbation experiments can eventually eliminate possible regulatory relations, they do not efficiently eliminate them. We give an anytime algorithm for computing weakly monotonically increasing lower bounds on the number of alternative network explanations for the results of any set of gene perturbation experiments. The lower bound is typically astronomical. We illustrate the point by computing a lower bound—10—on the number of networks for 9 genes that are consistent with a recent series of gene perturbation experiments [Ideker et al., 2001]. Finally, we argue that the computation of the most informative experiments can only be heuristic.