Detecting protein variants by mass spectrometry: a comprehensive study in cancer cell-lines

BackgroundOnco-proteogenomics aims to understand how changes in a cancer’s genome influences its proteome. One challenge in integrating these molecular data is the identification of aberrant protein products from mass-spectrometry (MS) datasets, as traditional proteomic analyses only identify proteins from a reference sequence database.MethodsWe established proteomic workflows to detect peptide variants within MS datasets. We used a combination of publicly available population variants (dbSNP and UniProt) and somatic variations in cancer (COSMIC) along with sample-specific genomic and transcriptomic data to examine proteome variation within and across 59 cancer cell-lines.ResultsWe developed a set of recommendations for the detection of variants using three search algorithms, a split target-decoy approach for FDR estimation, and multiple post-search filters. We examined 7.3 million unique variant tryptic peptides not found within any reference proteome and identified 4771 mutations corresponding to somatic and germline deviations from reference proteomes in 2200 genes among the NCI60 cell-line proteomes.ConclusionsWe discuss in detail the technical and computational challenges in identifying variant peptides by MS and show that uncovering these variants allows the identification of druggable mutations within important cancer genes.

[1]  Joshua F. McMichael,et al.  DGIdb - Mining the druggable genome , 2013, Nature Methods.

[2]  Morgan C. Giddings,et al.  Peppy: proteogenomic search software. , 2013, Journal of proteome research.

[3]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[4]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[5]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[6]  Shivashankar H. Nagaraj,et al.  PGTools: A Software Suite for Proteogenomic Data Analysis and Visualization. , 2015, Journal of proteome research.

[7]  Faraz Hach,et al.  Spatial genomic heterogeneity within localized, multifocal prostate cancer , 2015, Nature Genetics.

[8]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[9]  S. Pinto,et al.  Proteogenomics for understanding oncology: recent advances and future prospects , 2016, Expert review of proteomics.

[10]  Mark V Ivanov,et al.  Exome-driven characterization of the cancer cell lines at the proteome level: the NCI-60 case study. , 2014, Journal of proteome research.

[11]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[12]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[13]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[14]  Bernhard Y. Renard,et al.  MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms , 2015, Bioinform..

[15]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[16]  Andrew Emili,et al.  PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals*S , 2003, Molecular & Cellular Proteomics.

[17]  Mathias Wilhelm,et al.  Global proteome analysis of the NCI-60 cell line panel. , 2013, Cell reports.

[18]  A. Safwat,et al.  Low-Grade Fibromyxoid Sarcoma: Incidence, Treatment Strategy of Metastases, and Clinical Significance of the FUS Gene , 2013, Sarcoma.

[19]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[20]  John I. Clark,et al.  Shotgun identification of protein modifications from protein complexes and lens tissue , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[22]  Deng Pan,et al.  DGIdb 2.0: mining clinically relevant drug–gene interactions , 2015, Nucleic Acids Res..

[23]  V. Bafna,et al.  Proteogenomics to discover the full coding content of genomes: a computational perspective. , 2010, Journal of proteomics.

[24]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[25]  Mathias Wilhelm,et al.  Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins. , 2015, Cell reports.

[26]  P. Boutros,et al.  Onco-proteogenomics: cancer proteomics joins forces with genomics , 2014, Nature Methods.

[27]  Mathias Wilhelm,et al.  Building ProteomeTools based on a complete synthetic human proteome , 2017, Nature Methods.

[28]  Alexey I Nesvizhskii,et al.  MSFragger: ultrafast and comprehensive peptide identification in shotgun proteomics , 2017, Nature Methods.

[29]  D. Fenyö,et al.  Proteogenomics from a bioinformatics angle: A growing field. , 2015, Mass spectrometry reviews.

[30]  J. Yates,et al.  Probability-based validation of protein identifications using a modified SEQUEST algorithm. , 2002, Analytical chemistry.

[31]  V. Blinov,et al.  PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics. , 2015, Journal of proteome research.

[32]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[33]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[34]  P. Meltzer,et al.  The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology. , 2013, Cancer research.

[35]  W. Pao,et al.  A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics* , 2011, Molecular & Cellular Proteomics.

[36]  Thomas D. Wu,et al.  A comprehensive transcriptional portrait of human cancer cell lines , 2014, Nature Biotechnology.

[37]  Samuel H Payne,et al.  Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks , 2016, Molecular & Cellular Proteomics.

[38]  Heejin Park,et al.  NextSearch: A Search Engine for Mass Spectrometry Data against a Compact Nucleotide Exon Graph. , 2015, Journal of proteome research.

[39]  Michael R. Shortreed,et al.  Human Proteomic Variation Revealed by Combining RNA-Seq Proteogenomics and Global Post-Translational Modification (G-PTM) Search Strategy , 2015, Journal of proteome research.

[40]  Xiaojing Wang,et al.  customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search , 2013, Bioinform..

[41]  Sangya Pundir,et al.  UniProt Protein Knowledgebase. , 2017, Methods in molecular biology.

[42]  W. Van Criekinge,et al.  PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration , 2014, Nucleic acids research.

[43]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[44]  Xun Xu,et al.  PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq , 2016, BMC Bioinformatics.

[45]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[46]  M. Mann,et al.  The coming age of complete, accurate, and ubiquitous proteomes. , 2013, Molecular cell.

[47]  Li Ding,et al.  An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer* , 2015, Molecular & Cellular Proteomics.

[48]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[49]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.