Bridging structural biology and genomics: assessing protein interaction data with known complexes.

Currently, there is a major effort to map protein-protein interactions on a genome-wide scale. The utility of the resulting interaction networks will depend on the reliability of the experimental methods and the coverage of the approaches. Known macromolecular complexes provide a defined and objective set of protein interactions with which to compare biochemical and genetic data for validation. Here, we show that a significant fraction of the protein-protein interactions in genome-wide datasets, as well as many of the individual interactions reported in the literature, are inconsistent with the known 3D structures of three recent complexes (RNA polymerase II, Arp2/3 and the proteasome). Furthermore, comparison among genome-wide datasets, and between them and a larger (but less well resolved) group of 174 complexes, also shows marked inconsistencies. Finally, individual interaction datasets, being inherently noisy, are best used when integrated together, and we show how simple Bayesian approaches can combine them, significantly decreasing error rate.