Flexible and Fast Mapping of Peptides to a Proteome with ProteoMapper.

Bottom-up proteomics relies on the proteolytic or chemical cleavage of proteins into peptides, the identification of those peptides via mass spectrometry, and the mapping of the identified peptides back to the reference proteome to infer which possible proteins are identified. Reliable mapping of peptides to proteins still poses substantial challenges when considering similar proteins, protein families, splice isoforms, sequence variation, and possible residue mass modifications, combined with an imperfect and incomplete understanding of the proteome. The ProteoMapper tool enables a comprehensive and rapid mapping of peptides to a reference proteome. The indexer component creates a segmented index for an input proteome from a FASTA or PEFF file. The ProMaST component provides ultrafast mapping of one or more input peptides against the index. ProteoMapper allows searches that take into account known sequence variation encoded in PEFF files. It also enables fuzzy searches to find highly similar peptides with residue order changes or other isobaric or near-isobaric substitutions within a specified mass tolerance. We demonstrate an example of a one-hit-wonder identification in PeptideAtlas that may be better explained by a combination of catalogued and uncatalogued sequence variation in another highly observed protein. ProteoMapper is a free and open source, available for local use after downloading, embedding in other applications, as an online web tool at http://www.peptideatlas.org/map , and as a web service.

[1]  Mathieu Schaeffer,et al.  The neXtProt peptide uniqueness checker: a tool for the proteomics community , 2017, Bioinform..

[2]  Luis Mendoza,et al.  Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics. , 2016, Journal of proteome research.

[3]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[4]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[5]  Alan Bridge,et al.  The UniProtKB guide to the human proteome , 2016, Database J. Biol. Databases Curation.

[6]  David D. Shteynberg,et al.  State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet. , 2015, Journal of proteome research.

[7]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[8]  Ying Zhang,et al.  The neXtProt knowledgebase on human proteins: current status , 2014, Nucleic Acids Res..

[9]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[10]  John R Yates,et al.  Mass spectrometry in high-throughput proteomics: ready for the big time , 2010, Nature Methods.

[11]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[12]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[13]  Yongsheng Xiao,et al.  Distinguishing between Leucine and Isoleucine by Integrated LC-MS Analysis Using an Orbitrap Fusion Mass Spectrometer. , 2016, Analytical chemistry.

[14]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[15]  Sam Hanash,et al.  The Human Proteome Organization , 2002, Molecular & Cellular Proteomics.

[16]  Mehdi Mesri,et al.  Linking cancer genome to proteome: NCI's investment into proteogenomics , 2014, Proteomics.

[17]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[18]  Matthias Mann,et al.  BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes , 2018, Nature Methods.

[19]  Alexey I Nesvizhskii,et al.  MSFragger: ultrafast and comprehensive peptide identification in shotgun proteomics , 2017, Nature Methods.

[20]  Amos Bairoch,et al.  Exploring the Uncharacterized Human Proteome Using neXtProt. , 2018, Journal of proteome research.

[21]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[22]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[23]  M. Mann,et al.  The coming age of complete, accurate, and ubiquitous proteomes. , 2013, Molecular cell.

[24]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[25]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[26]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[27]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics , 2015, Proteomics. Clinical applications.

[28]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[29]  Lydie Lane,et al.  Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. , 2018, Journal of proteome research.

[30]  Lydie Lane,et al.  Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. , 2017, Journal of proteome research.

[31]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[32]  Chris Sander,et al.  Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome , 2016, Cell.