Network cleanup

volume 31 number 8 AuGuST 2013 nature biotechnology Babak Alipanahi and Brendan J. Frey are with the Departments of Electrical and Computer Engineering and the Donnelly Centre for Cellular and Biomolecular Research at the University of Toronto, Toronto, Ontario, Canada. e-mail: babak@psi.toronto.edu or frey@psi.toronto.edu the results with the known true network of inter actions (Fig. 1a). Both methods account for how the total, measured effect of a source node on a target node is mediated by the direct neighbors of the target (Fig. 1b). In addition, if the source is directly connected to the target node, that direct effect is accounted for too. This accounting is not quite correct because of loops, but it is reasonably accurate if the strengths of indirect effects decay substantially as they propagate around the loops. As a concrete example, consider the network shown in Figure 1a, in which circles represent genes and links between genes quite different starting points. Feizi et al.2 view the measured correlations as a consequence of flows along the edges in the true network. In contrast, Barzel et al.3 treat the measured correlations as small perturbations that result from adding up the small perturbations induced along edges in the true network. In both cases, the authors turn the seemingly intractable problem of network inference into easily implemented algorithms that invert these processes to obtain the true network from the measured correlations. To illustrate the methods, we have implemented them, applied them to a ‘toy’ gene regulatory network problem and compared Networks offer an alluring simplicity for representing complex systems of interacting parts1. But when networks are constructed from biological data through statistical inference, it is often unclear how faithfully they represent the real systems. In many cases, true links between nodes are obscured by a sea of noise in the form of erroneous links. In this issue, two studies by Feizi et al.2 and Barzel et al.3 describe efficient, easily implemented methods for identifying and removing erroneous links, thereby producing more accurate networks. Both papers demonstrate the application of their techniques to large-scale practical problems, such as the DREAM5 gene regulatory network inference challenge4. In addition, Feizi et al.2 explore other applications by analyzing networks of interacting residues in protein structures and social networks of scientists. The problem of erroneous links in inferred networks was described in 1921 by the geneticist—and founder of the field of network inference—Sewall Wright, who said, “The degree of correlation between two variables can be calculated by well-known methods, but when it is found it gives merely the resultant of all connecting paths of influence”5. As an example, suppose one gene directly controls a second gene, which in turn directly controls a third gene. Correlation analysis will erroneously indicate that the first gene directly influences the third gene. Other methods for linking variables, such as mutual information and distance correlation, are limited by the same problem. The goal of network inference is to identify the direct links and their strengths while suppressing the indirect, or transitive, associations. This problem is difficult because experimental techniques often cannot distinguish between direct and indirect effects. Feizi et al.2 and Barzel et al.3 tackle the problem of network inference from conceptually Network cleanup