A statistical framework for combining and interpreting proteomic datasets

MOTIVATION To identify accurately protein function on a proteome-wide scale requires integrating data within and between high-throughput experiments. High-throughput proteomic datasets often have high rates of errors and thus yield incomplete and contradictory information. In this study, we develop a simple statistical framework using Bayes' law to interpret such data and combine information from different high-throughput experiments. In order to illustrate our approach we apply it to two protein complex purification datasets. RESULTS Our approach shows how to use high-throughput data to calculate accurately the probability that two proteins are part of the same complex. Importantly, our approach does not need a reference set of verified protein interactions to determine false positive and false negative error rates of protein association. We also demonstrate how to combine information from two separate protein purification datasets into a combined dataset that has greater coverage and accuracy than either dataset alone. In addition, we also provide a technique for estimating the total number of proteins which can be detected using a particular experimental technique. AVAILABILITY A suite of simple programs to accomplish some of the above tasks is available at www.unm.edu/~compbio/software/DatasetAssess

[1]  Frank Harary,et al.  Graph Theory , 2016 .

[2]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[3]  R. Hilborn,et al.  The Ecological Detective: Confronting Models with Data , 1997 .

[4]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[6]  H. Herzel,et al.  Is there a bias in proteome research? , 2001, Genome research.

[7]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[8]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[9]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[10]  S. Gygi,et al.  Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yoshihide Hayashizaki,et al.  Interaction generality, a measurement to assess the reliability of a protein-protein interaction. , 2002, Nucleic acids research.

[12]  Hierarchical Organization of Modularity in Metabolic Networks Supporting Online Material , 2002 .

[13]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[14]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[15]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[16]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  D. Kell,et al.  High-throughput classification of yeast mutants for functional genomics using metabolic footprinting , 2003, Nature Biotechnology.

[18]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[19]  Béla Bollobás,et al.  Random Graphs , 1985 .

[20]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[21]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[22]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[23]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[24]  Ronald W. Davis,et al.  Systematic screen for human disease genes in yeast , 2002, Nature Genetics.