A pipeline for identifying endogenous neuropeptides from spectral archives

Shotgun proteomics experiments often provide a big amount of spectra data; however, a big part of them remain unidentified. Many unidentified spectra that are high probably from peptides could be revealed by data mining methods such as clustering. This idea motivates researchers to build 'spectral archives' to identify more peptides from the previously analysed resources. The objective is to build a general way to identify peptides for these high possibility spectra in spectral archives, to help biologists to get more output from the data. We here propose a novel generic pipeline for this approach, based on the PRIDE cluster resources, rather than building a complete archive from scratch. We applied our pipeline to test the identification of endogenous neuropeptides in rat. 33 high probability peptide-induced spectra have been exposed from rat's unidentified spectra in PRIDE cluster's archive.

[1]  Marcus Svensson,et al.  Heat stabilization of the tissue proteome: a new technology for improved proteomics. , 2009, Journal of proteome research.

[2]  Henry H. N. Lam Building and Searching Tandem Mass Spectral Libraries for Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[3]  Andrew R. Jones,et al.  An Introduction to Proteome Bioinformatics , 2010, Proteome Bioinformatics.

[4]  Olga Vitek,et al.  A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet , 2012, BMC Bioinformatics.

[5]  Ari M Frank,et al.  A ranking-based scoring function for peptide-spectrum matches. , 2009, Journal of proteome research.

[6]  Ravali Adusumilli,et al.  Data Conversion with ProteoWizard msConvert. , 2017, Methods in molecular biology.

[7]  Katrin Marcus,et al.  Instruments and methods in proteomics. , 2011, Methods in molecular biology.

[8]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[9]  Johannes Griss,et al.  PRIDE Cluster: building a consensus of proteomics data , 2013, Nature Methods.

[10]  Wanwipa Vongsangnak,et al.  Informatics for Metabolomics. , 2016, Advances in experimental medicine and biology.

[11]  Johannes Griss,et al.  Spectral library searching in proteomics , 2016, Proteomics.

[12]  Liisa Holm,et al.  RSDB: representative protein sequence databases have high information content , 2000, Bioinform..

[13]  Richard D. Smith,et al.  De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. , 2008, Analytical chemistry.

[14]  Xin Zhang,et al.  Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis , 2011, Proteomics.

[15]  Pavel A. Pevzner,et al.  Spectral Archives: Extending Spectral Libraries to Analyze both Identified and Unidentified Spectra , 2011, Nature Methods.

[16]  S. Mohammed,et al.  Improved identification of endogenous peptides from murine nervous tissue by multiplexed peptide extraction methods and multiplexed mass spectrometric analysis. , 2009, Journal of proteome research.

[17]  J. Olsen,et al.  Analytic framework for peptidomics applied to large-scale neuropeptide identification , 2016, Nature Communications.

[18]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[19]  Richard D. Smith,et al.  Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.

[20]  Peer Bork,et al.  Bioinformatics Analysis of Functional Associations of PTMs. , 2017, Methods in molecular biology.

[21]  Henry Lam,et al.  Tandem mass spectral libraries of peptides and their roles in proteomics research. , 2017, Mass spectrometry reviews.

[22]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[23]  P. Pevzner,et al.  Spectral Dictionaries , 2009, Molecular & Cellular Proteomics.

[24]  Steven J. M. Jones,et al.  De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data , 2009, Genome Biology.

[25]  Lennart Martens,et al.  Implementation and application of a versatile clustering tool for tandem mass spectrometry data , 2007, Proteomics.

[26]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[27]  Patrick G. A. Pedrioli Trans-Proteomic Pipeline: A Pipeline for Proteomic Analysis , 2010, Proteome Bioinformatics.