DPHL: A DIA Pan-human Protein Mass Spectrometry Library for Robust Biomarker Discovery

To address the increasing need for detecting and validating protein biomarkers in clinical specimens, mass spectrometry (MS)-based targeted proteomic techniques, including the selected reaction monitoring (SRM), parallel reaction monitoring (PRM), and massively parallel data-independent acquisition (DIA), have been developed. For optimal performance, they require the fragment ion spectra of targeted peptides as prior knowledge. In this report, we describe a MS pipeline and spectral resource to support targeted proteomics studies for human tissue samples. To build the spectral resource, we integrated common open-source MS computational tools to assemble a freely accessible computational workflow based on Docker. We then applied the workflow to generate DPHL, a comprehensive DIA pan-human library, from 1096 data-dependent acquisition (DDA) MS raw files for 16 types of cancer samples. This extensive spectral resource was then applied to a proteomic study of 17 prostate cancer (PCa) patients. Thereafter, PRM validation was applied to a larger study of 57 PCa patients and the differential expression of three proteins in prostate tumor was validated. As a second application, the DPHL spectral resource was applied to a study consisting of plasma samples from 19 diffuse large B cell lymphoma (DLBCL) patients and 18 healthy control subjects. Differentially expressed proteins between DLBCL patients and healthy control subjects were detected by DIA-MS and confirmed by PRM. These data demonstrate that the DPHL supports DIA and PRM MS pipelines for robust protein biomarker discovery. DPHL is freely accessible at https://www.iprox.org/page/project.html?id=IPX0001400000.

[1]  Michael J MacCoss,et al.  Statistical control of peptide and protein error rates in large-scale targeted DIA analyses , 2017, Nature Methods.

[2]  Jian Wang,et al.  Spondin-2 (SPON2), a More Prostate-Cancer-Specific Diagnostic Biomarker , 2012, PloS one.

[3]  Hans Lilja,et al.  Serum markers for prostate cancer: a rational approach to the literature. , 2008, European urology.

[4]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[5]  Christopher R Kinsinger,et al.  The cancer proteomic landscape and the HUPO Cancer Proteome Project , 2018, Clinical Proteomics.

[6]  Oliver M. Bernhardt,et al.  Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues* , 2015, Molecular & Cellular Proteomics.

[7]  Heidi L. Rehm,et al.  Building the foundation for genomics in precision medicine , 2015, Nature.

[8]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[9]  Yi Xia,et al.  [Dynamic changes of serum proteomic spectra in patients with non-Hodgkin's lymphoma (NHL) before and after chemotherapy and screening of candidate biomarkers for NHL]. , 2008, Ai zheng = Aizheng = Chinese journal of cancer.

[10]  Andrew Emili,et al.  Panomics for Precision Medicine. , 2018, Trends in molecular medicine.

[11]  Ruedi Aebersold,et al.  Quantitative variability of 342 plasma proteins in a human twin population , 2015 .

[12]  Ying Jin,et al.  Serum C-reactive protein as an important prognostic variable in patients with diffuse large B cell lymphoma , 2012, Tumor Biology.

[13]  Beatriz Carvalho,et al.  Novel Stool-Based Protein Biomarkers for Improved Colorectal Cancer Screening , 2017, Annals of Internal Medicine.

[14]  Simone Mocellin,et al.  Telomerase and the search for the end of cancer. , 2013, Trends in molecular medicine.

[15]  Leslie A. Leinwand,et al.  The TEL patch of telomere protein TPP1 mediates telomerase recruitment and processivity , 2012, Nature.

[16]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[17]  John Chilton,et al.  Using iRT, a normalized retention time for more targeted measurement of peptides , 2012, Proteomics.

[18]  A. Meeker,et al.  The potential utility of telomere-related markers for cancer diagnosis , 2011, Journal of cellular and molecular medicine.

[19]  Mingwei Liu,et al.  A proteomic landscape of diffuse-type gastric cancer , 2018, Nature Communications.

[20]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[21]  Brendan MacLean,et al.  Building high-quality assay libraries for targeted analysis of SWATH MS data , 2015, Nature Protocols.

[22]  Gwendolyn M. Jang,et al.  Meta- and Orthogonal Integration of Influenza "OMICs" Data Defines a Role for UBR4 in Virus Budding. , 2015, Cell host & microbe.

[23]  Lars Malmström,et al.  Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry* , 2015, Molecular & Cellular Proteomics.

[24]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[25]  J. Kalbfleisch,et al.  Immunohistochemical detection of a fatty acid synthase (OA-519) as a predictor of progression of prostate cancer. , 1996, Human pathology.

[26]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[27]  M. Sarwal,et al.  Transplant genetics and genomics , 2017, Nature Reviews Genetics.

[28]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[29]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[30]  Lars Malmström,et al.  TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics , 2016, Nature Methods.

[31]  Friedrich Rippmann,et al.  KinMap: a web-based tool for interactive navigation through human kinome data , 2017, BMC Bioinformatics.

[32]  Rudolf Jaenisch,et al.  Genetic and molecular identification of three human TPP1 functions in telomerase action: recruitment, activation, and homeostasis set point regulation , 2014, Genes & development.

[33]  Ruedi Aebersold,et al.  Multi-region proteome analysis quantifies spatial heterogeneity of prostate tissue biomarkers , 2018, Life Science Alliance.

[34]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[35]  Loïc Dayon,et al.  Obesity shows preserved plasma proteome in large independent clinical cohorts , 2018, Scientific Reports.

[36]  Xiaochuan Dong,et al.  Identification of Protein Abundance Changes in Hepatocellular Carcinoma Tissues Using PCT–SWATH , 2018, Proteomics. Clinical applications.

[37]  Pasquale Ditonno,et al.  Spondin-2, a secreted extracellular matrix protein, is a novel diagnostic biomarker for prostate cancer. , 2013, The Journal of urology.

[38]  Tiannan Guo,et al.  High-Throughput Proteomic Analysis of Fresh-Frozen Biopsy Tissue Samples Using Pressure Cycling Technology Coupled with SWATH Mass Spectrometry. , 2018, Methods in molecular biology.

[39]  S. Pileri,et al.  Prognostic significance of CD44 expression in diffuse large B cell lymphoma of activated and germinal centre B cell-like types: a tissue microarray analysis of 90 cases , 2003, Journal of clinical pathology.

[40]  Dan Liu,et al.  TPP1 is a homologue of ciliate TEBP-β and interacts with POT1 to recruit telomerase , 2007, Nature.

[41]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[42]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[43]  Yuanyue Li,et al.  Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files , 2015, Nature Methods.

[44]  Tiannan Guo,et al.  Towards a one-stop solution for large-scale proteomics data analysis , 2017, Science China Life Sciences.

[45]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[46]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[47]  Yasset Perez-Riverol,et al.  A multi-center study benchmarks software tools for label-free proteome quantification , 2016, Nature Biotechnology.

[48]  Birgit Schilling,et al.  Clinical applications of quantitative proteomics using targeted and untargeted data-independent acquisition techniques , 2017, Expert review of proteomics.

[49]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[50]  Wen Gao,et al.  pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry , 2005, Bioinform..

[51]  Huanhuan Gao,et al.  High‐throughput proteomic analysis of FFPE tissue samples facilitates tumor stratification , 2019, Molecular oncology.

[52]  Hanno Steen,et al.  Advancing Urinary Protein Biomarker Discovery by Data-Independent Acquisition on a Quadrupole-Orbitrap Mass Spectrometer. , 2015, Journal of proteome research.