Bioinformatic analysis of proteomics data

Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages.

[1]  W. Baumeister,et al.  Characterization of the insertase for β-barrel proteins of the outer mitochondrial membrane , 2012, The Journal of cell biology.

[2]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[3]  Michael J. Sweredoski,et al.  Evaluation and optimization of mass spectrometric settings during data-dependent acquisition mode: focus on LTQ-Orbitrap mass analyzers. , 2013, Journal of proteome research.

[4]  C. Daub,et al.  BMC Systems Biology , 2007 .

[5]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[6]  Marc Vidal,et al.  Handbook of systems biology , 2014 .

[7]  Alexander R. Pico,et al.  GenMAPP 2: new features and resources for pathway analysis , 2007, BMC Bioinformatics.

[8]  Matthias Mann,et al.  Consecutive proteolytic digestion in an enzyme reactor increases depth of proteomic and phosphoproteomic analysis. , 2012, Analytical chemistry.

[9]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[10]  Eric W. Deutsch,et al.  A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis , 2013, Nature.

[11]  M. Mann,et al.  System-wide Perturbation Analysis with Nearly Complete Coverage of the Yeast Proteome by Single-shot Ultra HPLC Runs on a Bench Top Orbitrap* , 2011, Molecular & Cellular Proteomics.

[12]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[13]  J. Timms,et al.  LC-MS/MS in Proteomics , 2010, Methods in Molecular Biology.

[14]  Adam D. Schuyler,et al.  SciMiner: web-based literature mining tool for target identification and functional enrichment analysis , 2009, Bioinform..

[15]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[16]  Joaquín Dopazo,et al.  Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling , 2010, Nucleic Acids Res..

[17]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[18]  Jürgen Cox,et al.  A Framework for Intelligent Data Acquisition and Real-Time Database Searching for Shotgun Proteomics* , 2011, Molecular & Cellular Proteomics.

[19]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[20]  R. Moritz,et al.  Current algorithmic solutions for peptide-based proteomics data generation and identification. , 2013, Current opinion in biotechnology.

[21]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[22]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[23]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[24]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[25]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[26]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[27]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[28]  Helmut E Meyer,et al.  Sense and nonsense of pathway analysis software in proteomics. , 2011, Journal of proteome research.

[29]  Albert J R Heck,et al.  Trends in ultrasensitive proteomics. , 2012, Current opinion in chemical biology.

[30]  Lukas N. Mueller,et al.  Full Dynamic Range Proteome Analysis of S. cerevisiae by Targeted Proteomics , 2009, Cell.

[31]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[32]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[33]  Rainer Malik,et al.  From proteome lists to biological impact– tools and strategies for the analysis of large MS data sets , 2010, Proteomics.

[34]  Nan Guo,et al.  PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways , 2006, Nucleic Acids Res..

[35]  D. James,et al.  American Biotechnology Laboratory , 2002 .

[36]  Jürgen Cox,et al.  Proteomic Analysis of Cellular Systems , 2013 .

[37]  Matthias Mann,et al.  Bioinformatics analysis of mass spectrometry‐based proteomics data sets , 2009, FEBS letters.

[38]  Ian M. Donaldson,et al.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence , 2010, Database J. Biol. Databases Curation.

[39]  S. Mathivanan,et al.  A curated compendium of phosphorylation motifs , 2007, Nature Biotechnology.

[40]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[41]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[42]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[43]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[44]  Henry H. N. Lam Building and Searching Tandem Mass Spectral Libraries for Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[45]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[46]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[47]  Michael Riffle,et al.  Proteomics data repositories , 2009, Proteomics.

[48]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[49]  C. Eyers,et al.  Analysis of post-translational modifications by LC-MS/MS. , 2010, Methods in molecular biology.

[50]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[51]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[52]  Philip C. Andrews,et al.  A code and data archival and dissemination tool for the proteomics community , 2006 .

[53]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[54]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[55]  Chunaram Choudhary,et al.  Proteome-Wide Mapping of the Drosophila Acetylome Demonstrates a High Degree of Conservation of Lysine Acetylation , 2011, Science Signaling.

[56]  S. Gerber,et al.  Absolute quantification of protein and post-translational modification abundance with stable isotope–labeled synthetic peptides , 2011, Nature Protocols.

[57]  R. Aebersold,et al.  Selected reaction monitoring–based proteomics: workflows, potential, pitfalls and future directions , 2012, Nature Methods.

[58]  Michael J. Emanuele,et al.  Global Identification of Modular Cullin-RING Ligase Substrates , 2011, Cell.

[59]  Hans-Werner Mewes,et al.  CRONOS: the cross-reference navigation server , 2009, Bioinform..

[60]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[61]  Hwee Tong Tan,et al.  Subcellular fractionation methods and strategies for proteomics , 2010, Proteomics.

[62]  Brad T. Sherman,et al.  DAVID-WS: a stateful web service to facilitate gene/protein list analysis , 2012, Bioinform..

[63]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[64]  Juan Antonio Vizcaíno,et al.  Improvements in the protein identifier cross-reference service , 2012, Nucleic Acids Res..

[65]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[66]  H. Tipney,et al.  An introduction to effective use of enrichment analysis software , 2010, Human Genomics.

[67]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[68]  A. Engel,et al.  PloS One 2012 , 2015 .

[69]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[70]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[71]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[72]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[73]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[74]  W. Pearson,et al.  Current Protocols in Bioinformatics , 2002 .

[75]  Daniel Schwartz,et al.  Biological sequence motif discovery using motif-x. , 2011, Current protocols in bioinformatics.

[76]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[77]  Damon H. May,et al.  Extensive Gene-Specific Translational Reprogramming in a Model of B Cell Differentiation and Abl-Dependent Transformation , 2012, PLoS ONE.