Mining gene functional networks to improve mass-spectrometry-based protein identification

Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly. Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets. Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet Contact: miranker@cs.utexas.edu, marcotte@icmb.utexas.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Michael K. Coleman,et al.  Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. , 2005, Analytical chemistry.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[4]  A. Fraser,et al.  A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans , 2008, Nature Genetics.

[5]  Yuanfang Guan,et al.  A Genomewide Functional Network for the Laboratory Mouse , 2008, PLoS Comput. Biol..

[6]  L. O. Penalva,et al.  Biotinylated tags for recovery and characterization of ribonucleoprotein complexes. , 2004, BioTechniques.

[7]  David L Tabb,et al.  What's driving false discovery rates? , 2008, Journal of proteome research.

[8]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[9]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[10]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[11]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[12]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  B. Futcher,et al.  A Sampling of the Yeast Proteome , 1999, Molecular and Cellular Biology.

[15]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[16]  Jürgen Cox,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) and Proteome Quantitation of Mouse Embryonic Stem Cells to a Depth of 5,111 Proteins*S , 2008, Molecular & Cellular Proteomics.

[17]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[18]  Patrick G. A. Pedrioli,et al.  A high-quality catalog of the Drosophila melanogaster proteome , 2007, Nature Biotechnology.

[19]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[20]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[21]  Mark D. Robinson,et al.  FunSpec: a web-based cluster interpreter for yeast , 2002, BMC Bioinformatics.

[22]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[23]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[24]  W. H. Mager,et al.  The list of cytoplasmic ribosomal proteins of Saccharomyces cerevisiae , 1998, Yeast.

[25]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[26]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Kara Dolinski,et al.  Expanded protein information at SGD: new pages and proteome browser , 2006, Nucleic Acids Res..

[28]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[29]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[30]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[31]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[32]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[33]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[34]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[35]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[36]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[37]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[38]  Lewis Y. Geer,et al.  Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry , 2007, Proceedings of the National Academy of Sciences.

[39]  M. Mann,et al.  Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system , 2006, Genome Biology.

[40]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[41]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[42]  Rong Wang,et al.  Integrating shotgun proteomics and mRNA expression data to improve protein identification , 2009, Bioinform..

[43]  J. Derisi,et al.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise , 2006, Nature.

[44]  W. Kim,et al.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy , 2008, Genome Biology.

[45]  Edward M. Marcotte,et al.  mspire: mass spectrometry proteomics in Ruby , 2008, Bioinform..

[46]  Suzanne M. Paley,et al.  The Pathway Tools cellular overview diagram and Omics Viewer , 2006, Nucleic acids research.

[47]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[48]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[49]  Insuk Lee,et al.  Rational Extension of the Ribosome Biogenesis Pathway Using Network-Guided Genetics , 2009, PLoS biology.

[50]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[51]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.