DiagnoProt: a tool for discovery of new molecules by mass spectrometry

Motivation: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high‐quality unidentified mass spectra that are discriminative of different biological conditions. Results: We exemplify the use of DiagnoProt by shortlisting 4366 high‐quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus. Availability and Implementation: DiagnoProt, a demonstration video and a user tutorial are available at http://patternlabforproteomics.org/diagnoprot. Contact: andrerfsilva@gmail.com or paulo@pcarvalho.com Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[2]  Johannes Griss,et al.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets , 2016, Nature Methods.

[3]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[4]  John R Yates,et al.  PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results * , 2014, Molecular & Cellular Proteomics.

[5]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[6]  Priscila F. Aquino,et al.  Are gastric cancer resection margin proteomic profiles more similar to those from controls or tumors? , 2012, Journal of proteome research.

[7]  John R Yates,et al.  Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0 , 2015, Nature Protocols.

[8]  John R Yates,et al.  Analysis of quantitative proteomic data generated via multidimensional protein identification technology. , 2002, Analytical chemistry.

[9]  Eunok Paek,et al.  Quality assessment of tandem mass spectra based on cumulative intensity normalization. , 2006, Journal of proteome research.

[10]  Michael J MacCoss,et al.  A Deeper Look into Comet—Implementation and Features , 2015, Journal of The American Society for Mass Spectrometry.