Precision and recall estimates for two-hybrid screens

Motivation: Yeast two-hybrid screens are an important method to map pairwise protein interactions. This method can generate spurious interactions (false discoveries), and true interactions can be missed (false negatives). Previously, we reported a capture–recapture estimator for bait-specific precision and recall. Here, we present an improved method that better accounts for heterogeneity in bait-specific error rates. Result: For yeast, worm and fly screens, we estimate the overall false discovery rates (FDRs) to be 9.9%, 13.2% and 17.0% and the false negative rates (FNRs) to be 51%, 42% and 28%. Bait-specific FDRs and the estimated protein degrees are then used to identify protein categories that yield more (or fewer) false positive interactions and more (or fewer) interaction partners. While membrane proteins have been suggested to have elevated FDRs, the current analysis suggests that intrinsic membrane proteins may actually have reduced FDRs. Hydrophobicity is positively correlated with decreased error rates and fewer interaction partners. These methods will be useful for future two-hybrid screens, which could use ultra-high-throughput sequencing for deeper sampling of interacting bait–prey pairs. Availability: All software (C source) and datasets are available as supplemental files and at http://www.baderzone.org under the Lesser GPL v. 3 license. Contact: joel.bader@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[2]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[3]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[4]  B. Berger,et al.  Herpesviral Protein Networks and Their Interaction with the Human Proteome , 2006, Science.

[5]  A. Varshavsky,et al.  Split ubiquitin as a sensor of protein interactions in vivo. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[7]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[8]  Steven M. Johnson,et al.  A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. , 2008, Genome research.

[9]  Gilles Bailly,et al.  Interpool: interpreting smart-pooling results , 2008, Bioinform..

[10]  I. Stagljar,et al.  A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[12]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[13]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[14]  P. A. Prince,et al.  Lévy flight search patterns of wandering albatrosses , 1996, Nature.

[15]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[16]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[17]  Joel S. Bader,et al.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps , 2007, PLoS Comput. Biol..

[18]  G. Seber A NOTE ON THE MULTIPLE-RECAPTURE CENSUS. , 1965, Biometrika.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[21]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[22]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[23]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[24]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[25]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[28]  Joel S Bader,et al.  Systems biology. When proteomes collide. , 2006, Science.

[29]  Edda Klipp,et al.  Systems Biology , 1994 .

[30]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[32]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[33]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[34]  A. M. Edwards,et al.  Revisiting Lévy flight search patterns of wandering albatrosses, bumblebees and deer , 2007, Nature.

[35]  G. Jolly EXPLICIT ESTIMATES FROM CAPTURE-RECAPTURE DATA WITH BOTH DEATH AND IMMIGRATION-STOCHASTIC MODEL. , 1965, Biometrika.

[36]  Robert Gentleman,et al.  Estimating node degree in bait-prey graphs , 2008, Bioinform..

[37]  Joel S. Bader,et al.  When Proteomes Collide , 2006, Science.

[38]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.