Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage.

The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.

[1]  A. Falick,et al.  Analysis of hydrophobic proteins and peptides by electrospray ionization mass spectrometry. , 1993, Analytical biochemistry.

[2]  Richard Bonneau,et al.  Quantitative proteomic analysis of the budding yeast cell cycle using acid‐cleavable isotope‐coded affinity tag reagents , 2006, Proteomics.

[3]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[4]  Amy K. Schmid,et al.  The anatomy of microbial cell state transitions in response to oxygen. , 2007, Genome research.

[5]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[6]  Ronald J Moore,et al.  Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Igor Jurisica,et al.  Integrated proteomic and transcriptomic profiling of mouse lung development and Nmyc target genes , 2007, Molecular systems biology.

[8]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[9]  Alexey I Nesvizhskii,et al.  Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas , 2006, Genome Biology.

[10]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[11]  Min Pan,et al.  Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  V. Thorsson,et al.  Genome sequence of Halobacterium species NRC-1. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Marc T. Facciotti,et al.  Systems Biology Experimental Design - Considerations for Building Predictive Gene Regulatory Network Models for Prokaryotic Systems , 2004 .

[14]  Michael I. Jordan,et al.  Toward a protein profile of Escherichia coli: Comparison to its transcription profile , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Kenia Whitehead,et al.  An integrated systems approach for understanding cellular responses to gamma radiation , 2006, Molecular systems biology.

[16]  D Oesterhelt,et al.  Anaerobic growth of halobacteria. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[18]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[19]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[20]  Min Pan,et al.  A systems view of haloarchaeal strategies to withstand stress from transition metals. , 2006, Genome research.

[21]  Robertson Craig,et al.  The use of proteotypic peptide libraries for protein identification. , 2005, Rapid communications in mass spectrometry : RCM.

[22]  A. Ivanov,et al.  Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe , 2007, Molecular systems biology.

[23]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[24]  Min Pan,et al.  Proteomic Analysis of an Extreme Halophilic Archaeon, Halobacterium sp. NRC-1* , 2003, Molecular & Cellular Proteomics.

[25]  E. Winzeler,et al.  Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Richard Bonneau,et al.  General transcription factor specified global gene regulation in archaea , 2007, Proceedings of the National Academy of Sciences.

[27]  Ruedi Aebersold,et al.  Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry , 2003, Nature Biotechnology.

[28]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[29]  Michal J. Okoniewski,et al.  Exon level integration of proteomics and microarray data , 2008, BMC Bioinformatics.

[30]  Min Pan,et al.  Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. , 2004, Genome research.

[31]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[32]  Paul Shannon,et al.  Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1 , 2004, Genome Biology.

[33]  Timothy H. Wu,et al.  Proteome Analysis of Halobacterium sp. NRC-1 Facilitated by the Biomodule Analysis Tool BMSorter*S , 2006, Molecular & Cellular Proteomics.

[34]  O. Krokhin,et al.  Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. , 2006, Analytical chemistry.

[35]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[36]  L. Hood,et al.  Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. , 2001, Genome research.

[37]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[38]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[39]  R. Beynon,et al.  Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides , 2005, Nature Methods.

[40]  Amy K. Schmid,et al.  A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell , 2007, Cell.

[41]  Friedhelm Pfeiffer,et al.  The low molecular weight proteome of Halobacterium salinarum. , 2007, Journal of proteome research.

[42]  Nichole L. King,et al.  Human Plasma PeptideAtlas , 2005, Proteomics.