Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification

Mass spectrometry‐based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph‐based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.

[1]  Joel A. Kooren,et al.  A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies , 2013, Proteomics.

[2]  Oliver Kohlbacher,et al.  De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation , 2009, Electrophoresis.

[3]  Lennart Martens,et al.  High-throughput metaproteomics data analysis with Unipept: A tutorial. , 2018, Journal of proteomics.

[4]  Chunjie Luo,et al.  pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. , 2017, Analytical chemistry.

[5]  E. Marcotte,et al.  UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry. , 2016, Analytical chemistry.

[6]  E. Marcotte,et al.  Comprehensive de Novo Peptide Sequencing from MS/MS Pairs Generated through Complementary Collision Induced Dissociation and 351 nm Ultraviolet Photodissociation. , 2017, Analytical chemistry.

[7]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[8]  B. Ma Novor: Real-Time Peptide de Novo Sequencing Software , 2015, Journal of The American Society for Mass Spectrometry.

[9]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[10]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[11]  V. Barbosa,et al.  A multi-protease, multi-dissociation, bottom-up-to-top-down proteomic view of the Loxosceles intermedia venom , 2017, Scientific Data.

[12]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[13]  A. Lombardi,et al.  Proteomics in Forensic Sciences: Identification of the Nature of the Last Meal at Autopsy. , 2018, Journal of proteome research.

[14]  M. Dong,et al.  pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. , 2013, Journal of proteome research.

[15]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[16]  K. Clauser,et al.  Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. , 2013, Journal of proteome research.

[17]  C. Overall,et al.  Precision De Novo Peptide Sequencing Using Mirror Proteases of Ac-LysargiNase and Trypsin for Large-scale Proteomics* , 2019, Molecular & Cellular Proteomics.

[18]  Kristin H. Jarman,et al.  Applications and challenges of forensic proteomics. , 2019, Forensic science international.

[19]  Thilo Muth,et al.  A Potential Golden Age to Come—Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics , 2018, Proteomics.

[20]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[21]  James Butcher,et al.  MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota , 2016, Microbiome.

[22]  Wout Bittremieux,et al.  2018 YPIC Challenge: A case study in characterizing an unknown protein sample. , 2019, Journal of proteome research.

[23]  Bin Ma,et al.  An effective algorithm for peptide de novo sequencing from MS/MS spectra , 2005, J. Comput. Syst. Sci..

[24]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[25]  E. Chuangsuwanich,et al.  Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework* , 2019, Molecular & Cellular Proteomics.

[26]  P. Pevzner,et al.  De novo peptide sequencing and identification with precision mass spectrometry. , 2007, Journal of proteome research.

[27]  Thilo Muth,et al.  Navigating through metaproteomics data: A logbook of database searching , 2015, Proteomics.

[28]  Samuel E Miller,et al.  Postnovo: Postprocessing Enables Accurate and FDR-Controlled de Novo Peptide Sequencing. , 2018, Journal of proteome research.

[29]  Y. Yarden,et al.  Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination * , 2017, Molecular & Cellular Proteomics.

[30]  Peter Dawyndt,et al.  Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. , 2012, Journal of proteome research.

[31]  Anna M. Schmoker,et al.  Protein mass spectrometry detects multiple bloodmeals for enhanced Chagas disease vector ecology. , 2019, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[32]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[33]  Thilo Muth,et al.  DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra , 2013, Journal of proteome research.

[34]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[35]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[36]  M. Tomita,et al.  Future perspectives of therapeutic monoclonal antibodies. , 2019, Immunotherapy.

[37]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[38]  Rémi Longuespée,et al.  Multi-Enzymatic Limited Digestion: The Next-Generation Sequencing for Proteomics? , 2019, Journal of proteome research.

[39]  C. Amemiya,et al.  Assessing Protein Sequence Database Suitability Using De Novo Sequencing* , 2019, Molecular & Cellular Proteomics.

[40]  Sándor Suhai,et al.  Fragmentation pathways of protonated peptides. , 2005, Mass spectrometry reviews.

[41]  H. Rodriguez,et al.  Cancer Proteomics and the Elusive Diagnostic Biomarkers , 2019, Proteomics.

[42]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[43]  Sarah Lin,et al.  TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry data sets , 2019, Nature Biotechnology.

[44]  E. Kristiansson,et al.  Typing and Characterization of Bacteria Using Bottom-up Tandem Mass Spectrometry Proteomics * , 2017, Molecular & Cellular Proteomics.

[45]  B. Searle,et al.  A Face in the Crowd: Recognizing Peptides Through Database Search* , 2011, Molecular & Cellular Proteomics.

[46]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[47]  Hao Yang,et al.  pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework , 2019, Bioinform..

[48]  Bernhard Y. Renard,et al.  Evaluating de novo sequencing in proteomics: already an accurate alternative to database‐driven peptide identification? , 2018, Briefings Bioinform..

[49]  Matthias Selbach,et al.  Quantitative affinity purification mass spectrometry: a versatile technology to study protein–protein interactions , 2015, Front. Genet..

[50]  Baozhen Shan,et al.  De novo peptide sequencing by deep learning , 2017, Proceedings of the National Academy of Sciences.

[51]  Joshua E Elias,et al.  Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets. , 2016, Journal of proteome research.

[52]  Harald Barsnes,et al.  PeptideMapper: efficient and versatile amino acid sequence and tag mapping , 2017, Bioinform..

[53]  Yong J. Kil,et al.  Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery , 2017, Journal of The American Society for Mass Spectrometry.

[54]  Lennart Martens,et al.  Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. , 2020, Mass spectrometry reviews.

[55]  A. Shevchenko,et al.  Proteomics evidence for kefir dairy in Early Bronze Age China , 2014 .

[56]  William Stafford Noble,et al.  An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing. , 2016, Journal of proteome research.

[57]  Kristin H. Jarman,et al.  The Statistical Defensibility of Forensic Proteomics , 2019, ACS Symposium Series.

[58]  Sarah C. Jenson,et al.  Ricin‐like proteins from the castor plant do not influence liquid chromatography‐mass spectrometry detection of ricin in forensically relevant samples , 2017, Toxicon : official journal of the International Society on Toxinology.

[59]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[60]  G. Mamone,et al.  Mass spectrometry-based proteomics for the forensic identification of vomit traces. , 2019, Journal of proteomics.

[61]  Kristin H. Jarman,et al.  Proteomics Goes to Court: A Statistical Foundation for Forensic Toxin/Organism Identification Using Bottom-Up Proteomics. , 2018, Journal of proteome research.

[62]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[63]  Baozhen Shan,et al.  Complete De Novo Assembly of Monoclonal Antibody Sequences , 2016, Scientific Reports.

[64]  P. Carvalho,et al.  Seeing beyond the tip of the iceberg: A deep analysis of the venome of the Brazilian Rattlesnake, Crotalus durissus terrificus , 2015 .

[65]  Johannes Griss,et al.  Spectral library searching in proteomics , 2016, Proteomics.

[66]  Kristin H. Jarman,et al.  The probabilistic limit of detection for ricin identification using a shotgun proteomics assay. , 2019, Analytical chemistry.

[67]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[68]  John R Yates,et al.  PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results * , 2014, Molecular & Cellular Proteomics.

[69]  Robert Heyer,et al.  Challenges and perspectives of metaproteomic data analysis. , 2017, Journal of biotechnology.

[70]  B. Ma,et al.  De Novo Sequencing and Homology Searching‡‡* , 2011, Molecular & Cellular Proteomics.

[71]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.