Mass spectrometry at the interface of proteomics and genomics.

With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.

[1]  R. Sommer,et al.  Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models. , 2010, Genome research.

[2]  Peter R. Jungblut,et al.  Proteomics Reveals Open Reading Frames inMycobacterium tuberculosis H37Rv Not Predicted by Genomics , 2001, Infection and Immunity.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  L. F. Waanders,et al.  A Novel Chromatographic Method Allows On-line Reanalysis of the Proteome*S⃞ , 2008, Molecular & Cellular Proteomics.

[5]  Jacob D. Jaffe,et al.  Proteogenomic mapping as a complementary method to perform genome annotation , 2004, Proteomics.

[6]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[7]  E. Stauber,et al.  A new approach that allows identification of intron‐split peptides from mass spectrometric data in genomic databases , 2004, FEBS letters.

[8]  Joshua J. Coon,et al.  Sub-part-per-million Precursor and Product Mass Accuracy for High-throughput Proteomics on an Electron Transfer Dissociation-enabled Orbitrap Mass Spectrometer* , 2010, Molecular & Cellular Proteomics.

[9]  V. Bafna,et al.  Proteogenomics to discover the full coding content of genomes: a computational perspective. , 2010, Journal of proteomics.

[10]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[11]  Bernhard Kuster,et al.  Profiling Core Proteomes of Human Cell Lines by One-dimensional PAGE and Liquid Chromatography-Tandem Mass Spectrometry*S , 2003, Molecular & Cellular Proteomics.

[12]  Joachim M. Buhmann,et al.  PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra , 2007, Bioinform..

[13]  Daniel B. Goodman,et al.  Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. , 2008, Genome research.

[14]  O. Poch,et al.  Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol. , 2008, Genome research.

[15]  J. Choudhary,et al.  Interrogating the human genome using uninterpreted mass spectrometry data , 2001, Proteomics.

[16]  Richard D. Smith,et al.  Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. , 2007, Genome research.

[17]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[18]  M. Mann,et al.  Stable isotope labeling by amino acids in cell culture (SILAC) applied to quantitative proteomics of Bacillus subtilis. , 2010, Journal of proteome research.

[19]  J. Hudson,et al.  C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression , 2003, Nature Genetics.

[20]  B. Webb-Robertson Computational methods for mass spectrometry proteomics , 2011, Journal of the American Society for Mass Spectrometry.

[21]  Amy-Joan L Ham,et al.  Sample preparation and digestion for proteomic analyses using spin filters , 2005, Proteomics.

[22]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[23]  Juri Rappsilber,et al.  Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex , 1998, Nature Genetics.

[24]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[25]  M. Mann,et al.  Precision proteomics: The case for high resolution and high mass accuracy , 2008, Proceedings of the National Academy of Sciences.

[26]  Mladen A. Vouk,et al.  Predicting Shine–Dalgarno Sequence Locations Exposes Genome Annotation Errors , 2006, PLoS Comput. Biol..

[27]  Bernd Thiede,et al.  Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry , 2009, Proteomics.

[28]  Michael J MacCoss,et al.  Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. , 2008, Genome research.

[29]  John R Yates,et al.  Parallel identification of new genes in Saccharomyces cerevisiae. , 2002, Genome research.

[30]  John R Yates,et al.  The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation , 2008, Genome Biology.

[31]  J. Yates,et al.  Identifying the major proteome components of Haemophilus influenzae type‐strain NCTC 8143 , 1997, Electrophoresis.

[32]  M. Hippler,et al.  Mass spectrometric genomic data mining: Novel insights into bioenergetic pathways in Chlamydomonas reinhardtii , 2006, Proteomics.

[33]  J. Colinge,et al.  Experiments in searching small proteins in unannotated large eukaryotic genomes. , 2005, Journal of proteome research.

[34]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[35]  Fanyu Meng,et al.  Whole genome searching with shotgun proteomic data: applications for genome annotation. , 2008, Journal of proteome research.

[36]  M. Brent,et al.  Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. , 2003, Genome research.

[37]  Steven Salzberg,et al.  Efficient decoding algorithms for generalized hidden Markov model gene finders , 2005, BMC Bioinformatics.

[38]  P. Zimmermann,et al.  Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics , 2008, Science.

[39]  R. Guigó,et al.  Improving gene annotation using peptide mass spectrometry. , 2007, Genome research.

[40]  P. Mortensen,et al.  Mass spectrometry allows direct identification of proteins in large genomes , 2001, Proteomics.

[41]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[42]  Ting Chen,et al.  Gene-finding via tandem mass spectrometry , 2001, RECOMB.

[43]  P. Pevzner,et al.  Sequence similarity‐driven proteomics in organisms with unknown genomes by LC‐MS/MS and automated de novo sequencing , 2007, Proteomics.

[44]  J. Yates,et al.  An automated multidimensional protein identification technology for shotgun proteomics. , 2001, Analytical chemistry.

[45]  Masaru Tomita,et al.  One-dimensional capillary liquid chromatographic separation coupled with tandem mass spectrometry unveils the Escherichia coli proteome on a microarray scale. , 2010, Analytical chemistry.

[46]  Matthias Mann,et al.  A Dual Pressure Linear Ion Trap Orbitrap Instrument with Very High Sequencing Speed* , 2009, Molecular & Cellular Proteomics.

[47]  F Hillenkamp,et al.  Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers. , 1991, Analytical chemistry.

[48]  Juri Rappsilber,et al.  Microcolumns with self-assembled particle frits for proteomics. , 2002, Journal of chromatography. A.

[49]  HUPO Plasma Proteome Project: challenges and future directions. , 2006, Journal of proteome research.

[50]  M. Mann,et al.  Universal sample preparation method for proteome analysis , 2009, Nature Methods.

[51]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[52]  Inge Jonassen,et al.  High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example , 2008, BMC Genomics.

[53]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[54]  Gunnar Rätsch,et al.  mGene.web: a web service for accurate computational gene finding , 2009, Nucleic Acids Res..

[55]  Edgardo Moreno,et al.  Proteomics-based confirmation of protein expression and correction of annotation errors in the Brucella abortus genome , 2010, BMC Genomics.

[56]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.

[57]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[58]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[59]  D. Zühlke,et al.  A Proteomic View of an Important Human Pathogen – Towards the Quantification of the Entire Staphylococcus aureus Proteome , 2009, PloS one.

[60]  M. Mann,et al.  Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap*S , 2005, Molecular & Cellular Proteomics.

[61]  Cheng Soon Ong,et al.  mGene: accurate SVM-based gene finding with an application to nematode genomes. , 2009, Genome research.

[62]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[63]  J. Shabanowitz,et al.  Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Damian Fermin,et al.  Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics , 2006, Genome Biology.

[65]  F. Ahmed Utility of mass spectrometry for proteome ana lysis: part I. Conceptual and experimental approaches , 2008, Expert review of proteomics.

[66]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[67]  Alfonso Valencia,et al.  Modern Genome Annotation: The Biosapiens Network , 2008 .