Phenotype Prediction using a Tensor Representation and Deep Learning from Data Independent Acquisition Mass Spectrometry

A novel approach for phenotype prediction is developed for mass spectrometric data. First, the data-independent acquisition (DIA) mass spectrometric data is converted into a novel file format called “DIA tensor” (DIAT) which contains all the peptide precursors and fragments information and can be used for convenient DIA visualization. The DIAT format is fed directly into a deep neural network to predict phenotypes without the need to identify peptides or proteins. We applied this strategy to a collection of 102 hepatocellular carcinoma samples and achieved an accuracy of 96.8% in classifying malignant from benign samples. We further applied refined model to 492 samples of thyroid nodules to predict thyroid cancer; and achieved a predictive accuracy of 91.7% in an independent cohort of 216 test samples. In conclusion, DIA tensor enables facile 2D visualization of DIA proteomics data as well as being a new approach for phenotype prediction directly from DIA-MS data.

[1]  Thomas Wilhelm,et al.  Phenotype prediction based on genome-wide DNA methylation data , 2014, BMC Bioinformatics.

[2]  Alexey I Nesvizhskii,et al.  BatMass: a Java Software Platform for LC-MS Data Visualization in Proteomics and Metabolomics. , 2016, Journal of proteome research.

[3]  Lars Malmström,et al.  Efficient visualization of high-throughput targeted proteomics experiments: TAPIR , 2015, Bioinform..

[4]  Xiaochuan Dong,et al.  Identification of Protein Abundance Changes in Hepatocellular Carcinoma Tissues Using PCT–SWATH , 2018, Proteomics. Clinical applications.

[5]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[6]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[7]  Shesh N. Rai,et al.  Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics , 2017, bioRxiv.

[8]  Richard M Caprioli,et al.  Analysis of tissue specimens by matrix-assisted laser desorption/ionization imaging mass spectrometry in biological and clinical research. , 2013, Chemical reviews.

[9]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[10]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[11]  E. Simonson,et al.  The Electrocardiogram in Population Studies: A Classification System , 1960, Circulation.

[12]  Shannon E. Ellis,et al.  Improving the value of public RNA-seq expression data by phenotype prediction , 2017, bioRxiv.

[13]  J. Larry Jameson,et al.  Minimizing unnecessary surgery for thyroid nodules. , 2012, The New England journal of medicine.

[14]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[15]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Henryblackburn,et al.  The Electrocardiogram in Population Studies , 1960 .

[19]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[20]  Florian Gnad,et al.  The Case for Proteomics and Phospho‐Proteomics in Personalized Cancer Medicine , 2019, Proteomics. Clinical applications.

[21]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[22]  Livia S. Eberlin,et al.  Mass spectrometry imaging under ambient conditions. , 2013, Mass spectrometry reviews.

[23]  E. Wang,et al.  Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data. , 2014, Seminars in cancer biology.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[26]  Brian McCrindle,et al.  Recommendations for blood pressure measurement in human and experimental animals; part 1: blood pressure measurement in humans. , 2006, Hypertension.

[27]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[28]  J. Fagin,et al.  Biologic and Clinical Perspectives on Thyroid Cancer. , 2016, The New England journal of medicine.

[29]  M. Stoeckli,et al.  Imaging mass spectrometry: A new technology for the analysis of protein expression in mammalian tissues , 2001, Nature Medicine.

[30]  Ruedi Aebersold,et al.  Applications and Developments in Targeted Proteomics: From SRM to DIA/SWATH , 2016, Proteomics.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Richard M Caprioli,et al.  Molecular analysis of tumor margins by MALDI mass spectrometry in renal carcinoma. , 2010, Journal of proteome research.

[33]  Ruedi Aebersold,et al.  Multi-region proteome analysis quantifies spatial heterogeneity of prostate tissue biomarkers , 2018, Life Science Alliance.

[34]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.