Extending Comet for Global Amino Acid Variant and Post‐Translational Modification Analysis Using the PSI Extended FASTA Format

Protein identification by tandem mass spectrometry sequence database searching is a standard practice in many proteomics laboratories. The de facto standard for the representation of sequence databases used as input to sequence database search tools is the FASTA format. The Human Proteome Organization's Proteomics Standards Initiative has developed an extension to the FASTA format termed the proteomics standards initiative extended FASTA format or PSI extended FASTA format (PEFF) where additional information such as structural annotations are encoded in the protein description lines. Comet has been extended to automatically analyze the post translational modifications and amino acid substitutions encoded in PEFF databases. Comet's PEFF implementation and example analysis results searching a HEK293 dataset against the neXtProt PEFF database are presented.

[1]  Alexey I Nesvizhskii,et al.  MSFragger: ultrafast and comprehensive peptide identification in shotgun proteomics , 2017, Nature Methods.

[2]  Harald Barsnes,et al.  SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. , 2018, Journal of proteome research.

[3]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics , 2015, Proteomics. Clinical applications.

[4]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[5]  Maria Jesus Martin,et al.  The Proteins API: accessing key integrated protein and genome information , 2017, Nucleic Acids Res..

[6]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[7]  David Fenyö,et al.  Mass spectrometric protein identification using the global proteome machine. , 2010, Methods in molecular biology.

[8]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[9]  Brian L. Frey,et al.  Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search , 2015, Journal of proteome research.

[10]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[11]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[12]  John R Yates,et al.  Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0 , 2015, Nature Protocols.

[13]  Edward L. Huttlin,et al.  A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides , 2015, Nature Biotechnology.

[14]  Michael J MacCoss,et al.  A Deeper Look into Comet—Implementation and Features , 2015, Journal of The American Society for Mass Spectrometry.

[15]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[16]  Harald Barsnes,et al.  Proteomics Standards Initiative Extended FASTA Format (PEFF) , 2019, bioRxiv.

[17]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[18]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[19]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[20]  Martin Eisenacher,et al.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary , 2013, Database J. Biol. Databases Curation.

[21]  William Stafford Noble,et al.  Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis , 2014, Journal of proteome research.

[22]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[23]  Valmir Carneiro Barbosa,et al.  PatternLab for proteomics: a tool for differential shotgun proteomics , 2008, BMC Bioinformatics.