Phenotype Classification using Proteome Data in a Data-Independent Acquisition Tensor Format.

A novel approach for phenotype prediction is developed for data-independent acquisition (DIA) mass spectrometric (MS) data without the need for peptide precursor identification using existing DIA software tools. The first step converts the DIA-MS data file into a new file format called DIA tensor (DIAT), which can be used for the convenient visualization of all the ions from peptide precursors and fragments. DIAT files can be fed directly into a deep neural network to predict phenotypes such as appearances of cats, dogs, and microscopic images. As a proof of principle, we applied this approach to 102 hepatocellular carcinoma samples and achieved an accuracy of 96.8% in distinguishing malignant from benign samples. We further applied a refined model to classify thyroid nodules. Deep learning based on 492 training samples achieved an accuracy of 91.7% in an independent cohort of 216 test samples. This approach surpassed the deep-learning model based on peptide and protein matrices generated by OpenSWATH. In summary, we present a new strategy for DIA data analysis based on a novel data format called DIAT, which enables facile two-dimensional visualization of DIA proteomics data. DIAT files can be directly used for deep learning for biological and clinical phenotype classification. Future research will interpret the deep-learning models emerged from DIAT analysis.

[1]  Rui Sun,et al.  Generating Proteomic Big Data for Precision Medicine , 2020, Proteomics.

[2]  N. Iyer,et al.  Protein Classifier for Thyroid Nodules Learned from Rapidly Acquired Proteotypes , 2020, medRxiv.

[3]  Xue Cai,et al.  Data‐Independent Acquisition Mass Spectrometry‐Based Proteomics and Software Tools: A Glimpse in 2020 , 2020, Proteomics.

[4]  Ruedi Aebersold,et al.  Multi-region proteome analysis quantifies spatial heterogeneity of prostate tissue biomarkers , 2018, Life Science Alliance.

[5]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[6]  Alexey I Nesvizhskii,et al.  BatMass: a Java Software Platform for LC-MS Data Visualization in Proteomics and Metabolomics. , 2016, Journal of proteome research.

[7]  Lars Malmström,et al.  Efficient visualization of high-throughput targeted proteomics experiments: TAPIR , 2015, Bioinform..

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[10]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Ben C. Collins,et al.  A tool for the automated, targeted analysis of data-independent acquisition MS-data: OpenSWATH , 2014 .

[13]  Jarrett D. Egertson,et al.  Multiplexed MS/MS for Improved Data Independent Acquisition , 2013, Nature Methods.

[14]  Richard M Caprioli,et al.  Analysis of tissue specimens by matrix-assisted laser desorption/ionization imaging mass spectrometry in biological and clinical research. , 2013, Chemical reviews.

[15]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[16]  M. Stoeckli,et al.  Imaging mass spectrometry: A new technology for the analysis of protein expression in mammalian tissues , 2001, Nature Medicine.