Data Mining in Proteomics

In the past decade, major developments in instrumentation and methodology have been achieved in proteomics. For proteome investigations of complex biological samples derived from cell cultures, tissues, or whole organisms, several techniques are state of the art. Especially, many improvements have been undertaken to quantify differences in protein expression between samples from, e.g., treated vs. untreated cells and healthy vs. control patients. In this review, we give a brief insight into the main techniques, including gel-based protein separation techniques, and the growing field of mass spectrometry. The proteome describes the quantitative expression of genes within, e.g., a cell, a tissue, or body fluid at specific time points and under defined circumstances (1). In contrast to the genome, the proteome is highly dynamic and the protein expression pattern of cells in an organism varies depending on the physiological functions, differentiation status, and environmental factors. In addition, alternative splicing of mRNAs and a broad range of posttranslational modifications (e.g., phosphorylation, glycosylation, and ubiquitination) increase proteome complexity (2, 3). Transcription analysis also does not allow insight into degradation and transport phenomena, alternative splicing, or posttranslational modifications. Furthermore, mRNA and protein levels often do not correlate (4, 5). All these influences are unconsidered in genome analysis and underline the importance of proteome analysis to obtain deeper insights into cellular functions. In general, proteome analysis provides a snap-shot of proteins expressed in a cell or tissue at a defined time point (1). Indeed, not only qualitative analysis resulting in a defined “protein inventory”

[1]  S. Blair Hedges,et al.  The origin and evolution of model organisms , 2002, Nature Reviews Genetics.

[2]  Michael Hamacher,et al.  HBPP and the pursuit of standardisation , 2003, The Lancet Neurology.

[3]  Merlin Crossley,et al.  Sticky fingers: zinc-fingers as protein-recognition motifs. , 2007, Trends in biochemical sciences.

[4]  Lennart Martens,et al.  Toward a Successful Clinical Neuroproteomics The 11th HUPO Brain Proteome Project Workshop 3 March, 2009, Kolymbari, Greece , 2009, Proteomics. Clinical applications.

[5]  Ozlem Keskin,et al.  PRISM: protein-protein interaction prediction by structural matching. , 2008, Methods in molecular biology.

[6]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[7]  A. Schechter,et al.  Sickle hemoglobin polymerization in solution and in cells. , 1985, Annual review of biophysics and biophysical chemistry.

[8]  A. Force,et al.  The probability of preservation of a newly arisen gene duplicate. , 2001, Genetics.

[9]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[10]  Jeroen Krijgsveld,et al.  Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics , 2003, Nature Biotechnology.

[11]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[12]  D. Eisenberg,et al.  Domain swapping: entangling alliances between proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Julie A. Hines,et al.  A proteome-wide protein interaction map for Campylobacter jejuni , 2007, Genome Biology.

[14]  Eugene A. Kapp,et al.  Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly‐available database , 2005, Proteomics.

[15]  Carlos Prieto,et al.  APID2NET: unified interactome graphic analyzer , 2007, Bioinform..

[16]  J. Wojcik,et al.  The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[17]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[18]  J. Lill,et al.  Proteomic tools for quantitation by mass spectrometry. , 2003, Mass spectrometry reviews.

[19]  Robert E. Akins,et al.  Seperation of proteins using cetyltrimethylammonium bromide discontinuous gel electrophoresis , 1994, Molecular biotechnology.

[20]  S. Kanaya,et al.  Large-scale identification of protein-protein interaction of Escherichia coli K-12. , 2006, Genome research.

[21]  S. Hanash,et al.  Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study , 2006, Nature Biotechnology.

[22]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[23]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[24]  Andrew Meade,et al.  Assembly rules for protein networks derived from phylogenetic-statistical analysis of whole genomes , 2007, BMC Evolutionary Biology.

[25]  Martin Eisenacher,et al.  Proteomics Data Collection (ProDaC): Publishing and Collecting Proteomics Data Sets in Public Repositories Using Standard Formats , 2010, Proteome Bioinformatics.

[26]  E I Shakhnovich,et al.  Structural similarity enhances interaction propensity of proteins. , 2006, Journal of molecular biology.

[27]  Lennart Martens,et al.  Functional annotation of proteins identified in human brain during the HUPO Brain Proteome Project pilot study , 2006, Proteomics.

[28]  F. Vandenesch,et al.  Isotope-labeled Protein Standards , 2007, Molecular & Cellular Proteomics.

[29]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[30]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[31]  I. Ispolatov,et al.  Binding properties and evolution of homodimers in protein–protein interaction networks , 2005, Nucleic acids research.

[32]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[33]  Erich Bornberg-Bauer,et al.  Convergent evolution of gene networks by single‐gene duplications in higher eukaryotes , 2004, EMBO reports.

[34]  Lennart Martens,et al.  A comparison of the HUPO Brain Proteome Project pilot with other proteomics studies , 2006, Proteomics.

[35]  Sarah A Teichmann,et al.  Novel specificities emerge by stepwise duplication of functional modules. , 2005, Genome research.

[36]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[37]  Patrick Aloy Shaping the future of interactome networks , 2007, Genome Biology.

[38]  Lennart Björkesten,et al.  Differential expression analysis of Escherichia coli proteins using a novel software for relative quantitation of LC‐MS/MS data , 2006, Proteomics.

[39]  D Eisenberg,et al.  3D domain swapping: A mechanism for oligomer assembly , 1995, Protein science : a publication of the Protein Society.

[40]  Jianmin Wu,et al.  Integrated network analysis platform for protein-protein interactions , 2009, Nature Methods.

[41]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[42]  Lennart Martens,et al.  The human platelet proteome mapped by peptide‐centric proteomics: A functional protein profile , 2005, Proteomics.

[43]  Antoine H P America,et al.  Comparative LC‐MS: A landscape of peaks and valleys , 2008, Proteomics.

[44]  Katrin Marcus,et al.  Quantitative analysis of highly homologous proteins: the challenge of assaying the “CYP-ome” by mass spectrometry , 2008, Analytical and bioanalytical chemistry.

[45]  Alfonso Valencia,et al.  Bioinformatics in the human interactome project , 2006, Bioinform..

[46]  R. Solé,et al.  Evolving protein interaction networks through gene duplication. , 2003, Journal of Theoretical Biology.

[47]  Michael Hamacher,et al.  Great mood in proteomics: Beijing and the HUPO Human Brain Proteome Project , 2005, Proteomics.

[48]  Martin Eisenacher,et al.  Proteomics today: Bioinformatics at its best. Proteomics and Bioinformatics – an inseparable couple , 2008, Proteomics.

[49]  Teresa M. Przytycka,et al.  DOMINE: a database of protein domain interactions , 2007, Nucleic Acids Res..

[50]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[51]  Dennis P Wall,et al.  A simple dependence between protein evolution rate and the number of protein-protein interactions , 2003, BMC Evolutionary Biology.

[52]  R. Beynon,et al.  Absolute Multiplexed Quantitative Analysis of Protein Expression during Muscle Development Using QconCAT* , 2007, Molecular & Cellular Proteomics.

[53]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[54]  H. M. Farrell,et al.  Charge separation of proteins complexed with sodium dodecyl sulfate by acid gel electrophoresis in the presence of cetyltrimethylammonium bromide. , 1979, Biochimica et biophysica acta.

[55]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[56]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[57]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[58]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[59]  I. Jurisica,et al.  Unequal evolutionary conservation of human protein interactions in interologous networks , 2007, Genome Biology.

[60]  Roded Sharan,et al.  A direct comparison of protein interaction confidence assignment schemes , 2006, BMC Bioinformatics.

[61]  John Kuriyan,et al.  The origin of protein interactions and allostery in colocalization , 2007, Nature.

[62]  Luisa Montecchi Palazzi,et al.  Comparative interactomics , 2005, FEBS letters.

[63]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[65]  M. Curie,et al.  Conservation and topology of protein interaction networks under duplication-divergence evolution , 2008 .

[66]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[67]  Lennart Martens,et al.  Getting a grip on proteomics data – Proteomics Data Collection (ProDaC) , 2009, Proteomics.

[68]  R. Appel,et al.  Guidelines for the next 10 years of proteomics , 2009, Proteomics.

[69]  Lennart Martens,et al.  Proteomics Data Collection – 5th ProDaC Workshop 4 March 2009, Kolympari, Crete, Greece , 2009, Proteomics.

[70]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[71]  Helmut E Meyer,et al.  Data handling and processing in proteomics , 2009, Expert review of proteomics.

[72]  M. Mann,et al.  SILAC Mouse for Quantitative Proteomics Uncovers Kindlin-3 as an Essential Factor for Red Blood Cell Function , 2008, Cell.

[73]  Arun K. Ramani,et al.  Protein interaction networks from yeast to human. , 2004, Current opinion in structural biology.

[74]  Erich Bornberg-Bauer,et al.  Finding Common Protein Interaction Patterns Across Organisms , 2006 .

[75]  Eugene V Koonin,et al.  No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly , 2003, BMC Evolutionary Biology.

[76]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[77]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.

[78]  Xiang-Sun Zhang,et al.  Hubs with Network Motifs Organize Modularity Dynamically in the Protein-Protein Interaction Network of Yeast , 2007, PloS one.

[79]  K. Resing,et al.  Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics*S , 2005, Molecular & Cellular Proteomics.

[80]  Z N Oltvai,et al.  Evolutionary conservation of motif constituents in the yeast protein interaction network , 2003, Nature Genetics.

[81]  Helmut E Meyer,et al.  Valid data from large-scale proteomics studies , 2005, Nature Methods.

[82]  Eric J. Deeds,et al.  Robust protein–protein interactions in crowded cellular environments , 2007, Proceedings of the National Academy of Sciences.

[83]  Lennart Martens,et al.  Automated reprocessing pipeline for searching heterogeneous mass spectrometric data of the HUPO Brain Proteome Project pilot phase , 2006, Proteomics.

[84]  A. Wagner,et al.  Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications , 2002, BMC Evolutionary Biology.

[85]  M. Mann,et al.  Mass spectrometry–based proteomics turns quantitative , 2005, Nature chemical biology.

[86]  Pedro Beltrão,et al.  Specificity and Evolvability in Eukaryotic Protein Interaction Networks , 2007, PLoS Comput. Biol..

[87]  V. Barbosa,et al.  Identifying differences in protein expression levels by spectral counting and feature selection. , 2008, Genetics and molecular research : GMR.

[88]  C. Adami,et al.  Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein–protein interactions data sets , 2003, BMC Evolutionary Biology.

[89]  Michael Hamacher,et al.  “Does understanding the brain need proteomics and does understanding proteomics need brains?” – Second HUPO HBPP Workshop hosted in Paris , 2004, Proteomics.

[90]  Geoffrey J. Barton,et al.  PIPs: human protein–protein interaction prediction database , 2008, Nucleic Acids Res..

[91]  U. Alon,et al.  Spontaneous evolution of modularity and network motifs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[92]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[93]  Lennart Martens,et al.  HUPO Brain Proteome Project: Summary of the pilot phase and introduction of a comprehensive data reprocessing strategy , 2006, Proteomics.

[94]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[95]  Sam Hanash,et al.  HUPO Initiatives Relevant to Clinical Proteomics* , 2004, Molecular & Cellular Proteomics.

[96]  A. E. Hirsh,et al.  Evolutionary rate depends on number of protein-protein interactions independently of gene expression level , 2004, BMC Evolutionary Biology.

[97]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[98]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[99]  G S Omenn,et al.  A common uptake system for serotonin and dopamine in human platelets. , 1978, The Journal of clinical investigation.

[100]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[101]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[102]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[103]  Philip M. Kim,et al.  Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights , 2006, Science.