Bioinformatics analysis of mass spectrometry‐based proteomics data sets

Proteomics has made tremendous progress, attaining throughput and comprehensiveness so far only seen in genomics technologies. The consequent avalanche of proteome level data poses great analytical challenges for downstream interpretation. We review bioinformatic analysis of qualitative and quantitative proteomic data, focusing on current and emerging paradigms employed for functional analysis, data mining and knowledge discovery from high resolution quantitative mass spectrometric data. Many bioinformatics tools developed for microarrays can be reused in proteomics, however, the uniquely quantitative nature of proteomics data also offers entirely novel analysis possibilities, which directly suggest and illuminate biological mechanisms.

[1]  Blagoy Blagoev,et al.  A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling , 2003, Nature Biotechnology.

[2]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[3]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[4]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[5]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[6]  Tobias Straub,et al.  Combined use of RNAi and quantitative proteomics to study gene function in Drosophila. , 2008, Molecular cell.

[7]  Joel B. Hagen,et al.  The origins of bioinformatics , 2000, Nature Reviews Genetics.

[8]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[9]  Keiryn L. Bennett,et al.  Introduction to Computational Proteomics , 2007, PLoS Comput. Biol..

[10]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[11]  J. Yates,et al.  Proteomics of organelles and large cellular structures , 2005, Nature Reviews Molecular Cell Biology.

[12]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[13]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[14]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[15]  J. Mesirov,et al.  Prediction of high-responding peptides for targeted protein assays by mass spectrometry , 2009, Nature Biotechnology.

[16]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[17]  Patrick G. A. Pedrioli,et al.  A high-quality catalog of the Drosophila melanogaster proteome , 2007, Nature Biotechnology.

[18]  S. Weinberger,et al.  Recent advancements in surface‐enhanced laser desorption/ionization‐time of flight‐mass spectrometry , 2000, Electrophoresis.

[19]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[20]  M. Mann,et al.  Comparative Proteomic Phenotyping of Cell Lines and Primary Cells to Assess Preservation of Cell Type-specific Functions , 2009, Molecular & Cellular Proteomics.

[21]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[22]  Matthias Mann,et al.  In-depth Analysis of the Adipocyte Proteome by Mass Spectrometry and Bioinformatics*S , 2007, Molecular & Cellular Proteomics.

[23]  Ruedi Aebersold,et al.  The study of macromolecular complexes by quantitative proteomics , 2003, Nature Genetics.

[24]  R. Aebersold,et al.  Selected reaction monitoring for quantitative proteomics: a tutorial , 2008, Molecular systems biology.

[25]  K. Resing,et al.  Mapping protein post-translational modifications with mass spectrometry , 2007, Nature Methods.

[26]  Nikolaj Blom,et al.  Motif Decomposition of the Phosphotyrosine Proteome Reveals a New N-terminal Binding Motif for SHIP2*S , 2008, Molecular & Cellular Proteomics.

[27]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[28]  M. Mann,et al.  Analysis of the mouse liver proteome using advanced mass spectrometry. , 2007, Journal of proteome research.

[29]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[30]  Marjan S. Bolouri,et al.  Integrated Analysis of Protein Composition, Tissue Diversity, and Gene Regulation in Mouse Mitochondria , 2003, Cell.

[31]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[32]  M. Mann,et al.  Precision proteomics: The case for high resolution and high mass accuracy , 2008, Proceedings of the National Academy of Sciences.

[33]  D. Lauffenburger,et al.  Time-resolved Mass Spectrometry of Tyrosine Phosphorylation Sites in the Epidermal Growth Factor Receptor Signaling Network Reveals Dynamic Modules*S , 2005, Molecular & Cellular Proteomics.

[34]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[35]  Sarah Calvo,et al.  Systematic identification of human mitochondrial disease genes through integrative genomics , 2006, Nature Genetics.

[36]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[37]  J. X. Pang,et al.  Biomarker discovery in urine by proteomics. , 2002, Journal of proteome research.

[38]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[39]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[40]  O. Jensen Interpreting the protein language using proteomics , 2006, Nature Reviews Molecular Cell Biology.

[41]  A. Klip,et al.  Rabs 8A and 14 are targets of the insulin-regulated Rab-GAP AS160 regulating GLUT4 traffic in muscle cells. , 2007, Biochemical and biophysical research communications.

[42]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[43]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[44]  D. Hochstrasser,et al.  From Proteins to Proteomes: Large Scale Protein Identification by Two-Dimensional Electrophoresis and Arnino Acid Analysis , 1996, Bio/Technology.

[45]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[46]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[47]  M. Mann,et al.  The human urinary proteome contains more than 1500 proteins, including a large proportion of membrane proteins , 2006, Genome Biology.

[48]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[49]  M. Mann,et al.  Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics , 2004, Nature Biotechnology.

[50]  M. Mann,et al.  PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites , 2007, Genome Biology.

[51]  P. Zimmermann,et al.  Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics , 2008, Science.

[52]  R. Aebersold,et al.  Advances in proteomic workflows for systems biology. , 2007, Current opinion in biotechnology.

[53]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[54]  S. Carr,et al.  A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology , 2008, Cell.

[55]  S. Gygi,et al.  An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets , 2005, Nature Biotechnology.

[56]  J. Yates,et al.  Direct analysis of protein complexes using mass spectrometry , 1999, Nature Biotechnology.

[57]  Lukas N. Mueller,et al.  An integrated mass spectrometric and computational framework for the analysis of protein interaction networks , 2007, Nature Biotechnology.

[58]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[59]  Rovshan G Sadygov,et al.  Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book , 2004, Nature Methods.

[60]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[61]  M. Mann,et al.  Mass spectrometry–based proteomics turns quantitative , 2005, Nature chemical biology.

[62]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[63]  Anthony K. L. Leung,et al.  Nucleolar proteome dynamics , 2005, Nature.

[64]  P. Bork,et al.  Dynamic Complex Formation During the Yeast Cell Cycle , 2005, Science.

[65]  Michelle S. Scott,et al.  Global Survey of Organ and Organelle Protein Expression in Mouse: Combined Proteomic and Transcriptomic Profiling , 2006, Cell.

[66]  Xiaohui S. Xie,et al.  A Mammalian Organelle Map by Protein Correlation Profiling , 2006, Cell.

[67]  Torben F. Ørntoft,et al.  Genome-wide Study of Gene Copy Numbers, Transcripts, and Protein Levels in Pairs of Non-invasive and Invasive Human Transitional Cell Carcinomas* , 2002, Molecular & Cellular Proteomics.

[68]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[69]  E. Diamandis Mass Spectrometry as a Diagnostic and a Cancer Biomarker Discovery Tool , 2004, Molecular & Cellular Proteomics.

[70]  N. Rajewsky,et al.  Widespread changes in protein synthesis induced by microRNAs , 2008, Nature.

[71]  M. Mann,et al.  Phosphotyrosine interactome of the ErbB-receptor kinase family , 2005, Molecular systems biology.

[72]  R. Apweiler,et al.  MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data , 2008, Genome Biology.

[73]  Rod B. Watson,et al.  Mapping the Arabidopsis organelle proteome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[74]  P. Bork,et al.  Systematic Discovery of In Vivo Phosphorylation Networks , 2007, Cell.

[75]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[76]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[77]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[78]  M. Mann,et al.  Organellar proteomics: turning inventories into insights , 2006, EMBO reports.

[79]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[80]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[81]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[82]  Jürgen Cox,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) and Proteome Quantitation of Mouse Embryonic Stem Cells to a Depth of 5,111 Proteins*S , 2008, Molecular & Cellular Proteomics.

[83]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[84]  Daniel B. Goodman,et al.  Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. , 2008, Genome research.

[85]  R. Aebersold,et al.  Analysis of protein complexes using mass spectrometry , 2007, Nature Reviews Molecular Cell Biology.

[86]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[87]  Jean-François Hocquette,et al.  Assessment of hierarchical clustering methodologies for proteomic data mining. , 2007, Journal of proteome research.

[88]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[89]  M. Mann,et al.  The Phosphotyrosine Interactome of the Insulin Receptor Family and Its Substrates IRS-1 and IRS-2*S , 2009, Molecular & Cellular Proteomics.

[90]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[91]  Waltraud X. Schulze,et al.  A Novel Proteomic Screen for Peptide-Protein Interactions* , 2004, Journal of Biological Chemistry.

[92]  M. Mann,et al.  Global and site-specific quantitative phosphoproteomics: principles and applications. , 2009, Annual review of pharmacology and toxicology.

[93]  Miguel A. Andrade-Navarro,et al.  Evolving research trends in bioinformatics , 2006, Briefings Bioinform..

[94]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[95]  Matthias Mann,et al.  Innovations: Functional and quantitative proteomics using SILAC , 2006, Nature Reviews Molecular Cell Biology.

[96]  J. Listgarten,et al.  Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectrometry , 2005, Molecular & Cellular Proteomics.

[97]  M. Mann,et al.  Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein*S , 2005, Molecular & Cellular Proteomics.

[98]  Matthias Mann,et al.  High confidence determination of specific protein-protein interactions using quantitative mass spectrometry. , 2008, Current opinion in biotechnology.