Detection of Unknown Amino Acid Substitutions Using Error-Tolerant Database Search.

Recent studies have demonstrated that mass spectrometry-based variant detection is feasible. Typically, either genomic variant databases or transcript data are used to construct customized target databases for the identification of single-amino acid variants in mass spectrometry data. However, both approaches require additional data to perform the identification of SAAVs. Here, we discuss the application of an error-tolerant peptide search engine such as BICEPS for identifying variants exclusively based on standard Uniprot databases. Thereby, unnecessary and redundant extensions of the search space are avoided. The workflow provides an unbiased view on the data; the search space is not limited to known variants and simultaneously does not require additional data. In a subsequent step a second identification search is performed to verify the initially identified variant peptides and aggregate information on the protein level.

[1]  Richard D. Smith,et al.  Proteogenomics: needs and roles to be filled by proteomics in genome annotation. , 2008, Briefings in functional genomics & proteomics.

[2]  Dexter T. Duncan,et al.  CanProVar: a human cancer proteome variation database , 2010, Human mutation.

[3]  Judith A J Steen,et al.  When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification , 2009, Proteomics.

[4]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[5]  Hanno Steen,et al.  Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. , 2002, Trends in biotechnology.

[6]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[7]  Xu Lin,et al.  Quantitative detection of single amino acid polymorphisms by targeted proteomics. , 2011, Journal of molecular cell biology.

[8]  D. Matthews,et al.  De novo derivation of proteomes from transcriptomes for transcript and protein identification , 2012, Nature Methods.

[9]  Saverio Brogna,et al.  Nonsense-mediated mRNA decay (NMD) mechanisms , 2009, Nature Structural &Molecular Biology.

[10]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[11]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[12]  John R Yates,et al.  Proteomics by mass spectrometry: approaches, advances, and applications. , 2009, Annual review of biomedical engineering.

[13]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[14]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[15]  Xiaojing Wang,et al.  customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search , 2013, Bioinform..

[16]  Gennifer E. Merrihew,et al.  Proteogenomic database construction driven from large scale RNA-seq data. , 2014, Journal of proteome research.

[17]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[18]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[19]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[20]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[21]  D. Creasy,et al.  Error tolerant searching of uninterpreted tandem mass spectrometry data , 2002, Proteomics.

[22]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[23]  Michael R. Shortreed,et al.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. , 2014, Journal of proteome research.

[24]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[25]  Zengyou He,et al.  Protein inference: a review , 2012, Briefings Bioinform..

[26]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[27]  M. Mann,et al.  Precision proteomics: The case for high resolution and high mass accuracy , 2008, Proceedings of the National Academy of Sciences.

[28]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[29]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[30]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[31]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[32]  W. Pao,et al.  A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics* , 2011, Molecular & Cellular Proteomics.

[33]  Andreas Quandt,et al.  An automated pipeline for high-throughput label-free quantitative proteomics. , 2013, Journal of proteome research.

[34]  Haiyan Tan,et al.  JUMP: A Tag-based Database Search Tool for Peptide Identification with High Sensitivity and Accuracy* , 2014, Molecular & Cellular Proteomics.

[35]  Bernhard Y. Renard,et al.  Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)* , 2012, Molecular & Cellular Proteomics.

[36]  Hanno Steen,et al.  Estimating the confidence of peptide identifications without decoy databases. , 2010, Analytical chemistry.

[37]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[38]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[39]  Kaizhong Zhang,et al.  SPIDER: software for protein identification from sequence tags with de novo sequencing error , 2004 .

[40]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[41]  Jens M. Rick,et al.  Quantitative mass spectrometry in proteomics: a critical review , 2007, Analytical and bioanalytical chemistry.

[42]  O. Kohlbacher,et al.  Probabilistic consensus scoring improves tandem mass spectrometry peptide identification. , 2011, Journal of proteome research.

[43]  Mihaela Zavolan,et al.  Expression proteomics of UPF1 knockdown in HeLa cells reveals autoregulation of hnRNP A2/B1 mediated by alternative splicing resulting in nonsense-mediated mRNA decay , 2010, BMC Genomics.

[44]  Matthew E Monroe,et al.  Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. , 2005, Journal of proteome research.

[45]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[46]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[47]  Hugh-George Patterton,et al.  Bioinformatics tools for the structural elucidation of multi-subunit protein complexes by mass spectrometric analysis of protein-protein cross-links , 2011, Briefings Bioinform..

[48]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[49]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[50]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[51]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[52]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[53]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[54]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[55]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[56]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[57]  Yangyang Bian,et al.  Large-scale quantification of single amino-acid variations by a variation-associated database search strategy. , 2014, Journal of proteome research.

[58]  Christodoulos A. Floudas,et al.  A hybrid method for peptide identification using integer linear optimization, local database search, and quadrupole time-of-flight or OrbiTrap tandem mass spectrometry. , 2008, Journal of proteome research.

[59]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[60]  Gerald J Wyckoff,et al.  Virtual polymorphism: finding divergent peptide matches in mass spectrometry data. , 2007, Analytical chemistry.

[61]  Bin Ma,et al.  De Novo Sequencing Methods in Proteomics , 2010, Proteome Bioinformatics.