Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis

Genomics-based neoantigen discovery can be enhanced by proteomic evidence, but there remains a lack of consensus on the performance of different quality control methods for variant peptide identification in proteogenomics. We propose to use the difference between accurately predicted and observed retention times for each peptide as a metric to evaluate different quality control methods. To this end, we develop AutoRT, a deep learning algorithm with high accuracy in retention time prediction. Analysis of three cancer data sets with a total of 287 tumor samples using different quality control strategies results in substantially different numbers of identified variant peptides and putative neoantigens. Our systematic evaluation, using the proposed retention time metric, provides insights and practical guidance on the selection of quality control strategies. We implement the recommended strategy in a computational workflow named NeoFlow to support proteogenomics-based neoantigen prioritization, enabling more sensitive discovery of putative neoantigens. Identifying mutation-derived neoantigens by proteogenomics requires robust strategies for quality control. Here, the authors propose peptide retention time as an evaluation metric for proteogenomics quality control methods, and develop a deep learning algorithm for accurate retention time prediction.

[1]  Matthias Mann,et al.  BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes , 2018, Nature Methods.

[2]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[3]  W. Pao,et al.  A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics* , 2011, Molecular & Cellular Proteomics.

[4]  Charles H. Yoon,et al.  An immunogenic personal neoantigen vaccine for patients with melanoma , 2017, Nature.

[5]  Kwok-Kin Wong,et al.  Intron retention is a source of neoepitopes in cancer , 2018, Nature Biotechnology.

[6]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[7]  Jürgen Cox,et al.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis , 2019, Nature Methods.

[8]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[9]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[10]  J. Utikal,et al.  Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer , 2017, Nature.

[11]  Chao Liu,et al.  Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine , 2018, Nature Biotechnology.

[12]  Subha Madhavan,et al.  Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities , 2019, Cell.

[13]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[14]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[15]  Bin Ma,et al.  Prediction of LC-MS/MS Properties of Peptides from Sequence by Deep Learning* , 2019, Molecular & Cellular Proteomics.

[16]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[17]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[18]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[19]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[20]  Alexander Franks,et al.  DART-ID increases single-cell proteome coverage , 2019, PLoS Comput. Biol..

[21]  P. Agius,et al.  Immunogenic neoantigens derived from gene fusions stimulate T cell responses , 2019, Nature Medicine.

[22]  Yang Zhang,et al.  Locus-specific Retention Predictor (LsRP): A Peptide Retention Time Predictor Developed for Precision Proteomics , 2017, Scientific Reports.

[23]  Vineet Bafna,et al.  Advanced Proteogenomic Analysis Reveals Multiple Peptide Mutations and Complex Immunoglobulin Peptides in Colon Cancer. , 2015, Journal of proteome research.

[24]  Benjamin Schubert,et al.  OptiType: precision HLA typing from next-generation sequencing data , 2014, Bioinform..

[25]  Samuel H. Payne,et al.  Proteogenomic strategies for identification of aberrant cancer peptides using large‐scale next‐generation sequencing data , 2014, Proteomics.

[26]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[27]  Bo Wen,et al.  PDV: an integrative proteomics data viewer , 2018, Bioinform..

[28]  Victor Spicer,et al.  Sequence-Specific Model for Peptide Retention Time Prediction in Strong Cation Exchange Chromatography. , 2017, Analytical chemistry.

[29]  Xun Xu,et al.  PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq , 2016, BMC Bioinformatics.

[30]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[31]  Xiaojing Wang,et al.  customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search , 2013, Bioinform..

[32]  G. Mills,et al.  Breast cancer quantitative proteome and proteogenomic landscape , 2019, Nature Communications.

[33]  G. Mills,et al.  RNA editing derived epitopes function as cancer antigens to elicit immune responses , 2018, Nature Communications.

[34]  M. V. Ivanov,et al.  Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics. , 2017, Journal of proteome research.

[35]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[36]  Kyu-Baek Hwang,et al.  Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. , 2017, Journal of proteome research.

[37]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[38]  M. Mann,et al.  Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry , 2016, Nature Communications.

[39]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[40]  Bo Wang,et al.  Quality control of single amino acid variations detected by tandem mass spectrometry. , 2018, Journal of proteomics.

[41]  Mark V Ivanov,et al.  Exome-driven characterization of the cancer cell lines at the proteome level: the NCI-60 case study. , 2014, Journal of proteome research.

[42]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[43]  Morten Nielsen,et al.  NetMHCpan 4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data , 2017, bioRxiv.

[44]  G. Pawelec,et al.  Faculty Opinions recommendation of Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[45]  Xiaojing Wang,et al.  PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations , 2019, Genome research.

[46]  Ruedi Aebersold,et al.  Descriptor : Generation of a zebra fi sh SWATH-MS spectral library to quantify 10 , 000 proteins , 2019 .

[47]  Ton N Schumacher,et al.  Cancer Neoantigens. , 2019, Annual review of immunology.

[48]  Timo Sachsenberg,et al.  MHCquant: Automated and reproducible data analysis for immunopeptidomics. , 2019, Journal of proteome research.

[49]  Jeffrey R. Whiteaker,et al.  Clinical potential of mass spectrometry-based proteogenomics , 2018, Nature Reviews Clinical Oncology.

[50]  Leng Han,et al.  A-to-I RNA Editing Contributes to Proteomic Diversity in Cancer. , 2018, Cancer cell.

[51]  Richard D. Smith,et al.  Proteome analyses using accurate mass and elution time peptide tags with capillary LC time-of-flight mass spectrometry , 2003, Journal of the American Society for Mass Spectrometry.

[52]  Karl Mechtler,et al.  CharmeRT: Boosting Peptide Identifications by Chimeric Spectra Identification and Retention Time Prediction , 2018, Journal of proteome research.

[53]  Michael R. Shortreed,et al.  Ultrafast Peptide Label-Free Quantification with FlashLFQ. , 2018, Journal of proteome research.

[54]  Michael J MacCoss,et al.  Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. , 2007, Analytical chemistry.

[55]  Markus S. Schröder,et al.  Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis , 2017, Nucleic acids research.

[56]  D. Park,et al.  Proteogenomic Characterization of Human Early-Onset Gastric Cancer. , 2019, Cancer cell.

[57]  Yafeng Zhu,et al.  Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow , 2018, Nature Communications.

[58]  Richard D. Smith,et al.  Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. , 2004, Journal of proteome research.

[59]  Samuel H Payne,et al.  Methods, Tools and Current Perspectives in Proteogenomics * , 2017, Molecular & Cellular Proteomics.

[60]  Matthew The,et al.  Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics , 2016, Bioinform..