Evaluation of top-down mass spectral identification with homologous protein sequences

BackgroundTop-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization.ResultsWe tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results.ConclusionsExperimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.

[1]  Hao Chi,et al.  pTop 1.0: A High-Accuracy and High-Efficiency Search Engine for Intact Protein Identification. , 2016, Analytical chemistry.

[2]  Yong-Bin Kim,et al.  ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry , 2007, Nucleic Acids Res..

[3]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[4]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[5]  N. Kelleher,et al.  Top Down proteomics: facts and perspectives. , 2014, Biochemical and biophysical research communications.

[6]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[7]  P. Pevzner,et al.  Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins , 2010, Molecular & Cellular Proteomics.

[8]  Xiaowen Liu,et al.  A mass graph‐based approach for the identification of modified proteoforms using top‐down tandem mass spectra , 2016, Bioinform..

[9]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[10]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[11]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[12]  Ying S. Ting,et al.  Protein Identification Using Top-Down Spectra* , 2012, Molecular & Cellular Proteomics.

[13]  Ravi Amunugama,et al.  Bottom-Up Mass Spectrometry-Based Proteomics as an Investigative Analytical Tool for Discovery and Quantification of Proteins in Biological Samples. , 2013, Advances in wound care.

[14]  Qiang Kou,et al.  TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization , 2016, Bioinform..

[15]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Jungkap Park,et al.  Informed-Proteomics: Open Source Software Package for Top-down Proteomics , 2017, Nature Methods.

[18]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[19]  Pavel A Pevzner,et al.  SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra. , 2016, Journal of proteome research.