Deep learning neural network tools for proteomics

Summary Mass-spectrometry-based proteomics enables quantitative analysis of thousands of human proteins. However, experimental and computational challenges restrict progress in the field. This review summarizes the recent flurry of machine-learning strategies using artificial deep neural networks (or “deep learning”) that have started to break barriers and accelerate progress in the field of shotgun proteomics. Deep learning now accurately predicts physicochemical properties of peptides from their sequence, including tandem mass spectra and retention time. Furthermore, deep learning methods exist for nearly every aspect of the modern proteomics workflow, enabling improved feature selection, peptide identification, and protein inference.

[1]  Birgit Schilling,et al.  Clinical applications of quantitative proteomics using targeted and untargeted data-independent acquisition techniques , 2017, Expert review of proteomics.

[2]  Thomas A. Hopf,et al.  Meltome atlas—thermal proteome stability across the tree of life , 2020, Nature Methods.

[3]  Ping-Huan Kuo,et al.  A Green Energy Application in Energy Management Systems by an Artificial Intelligence-Based Solar Radiation Forecasting Model , 2018 .

[4]  Masaru Tomita,et al.  Prediction of liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome using artificial neural networks. , 2006, Journal of proteome research.

[5]  Ilias Tagkopoulos,et al.  DeepPep: Deep proteome inference from peptide profiles , 2017, PLoS Comput. Biol..

[6]  Predrag Radivojac,et al.  A Machine Learning Approach to Predicting Peptide Fragmentation Spectra , 2005, Pacific Symposium on Biocomputing.

[7]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[8]  Philipp E. Geyer,et al.  A Novel LC System Embeds Analytes in Pre-formed Gradients for Rapid, Ultra-robust Proteomics. , 2018, Molecular & Cellular Proteomics.

[9]  Mathias Wilhelm,et al.  Building ProteomeTools based on a complete synthetic human proteome , 2017, Nature Methods.

[10]  Samuel H Payne,et al.  PECAN: Library Free Peptide Detection for Data-Independent Acquisition Tandem Mass Spectrometry Data , 2017, Nature Methods.

[11]  Lukas Käll,et al.  Peptide retention time prediction. , 2017, Mass spectrometry reviews.

[12]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[13]  DeepLC can predict retention times for peptides that carry as-yet unseen modifications , 2021 .

[14]  B Van Puyvelde,et al.  Removing the hidden data dependency of DIA with predicted spectral libraries , 2019, bioRxiv.

[15]  Leon Xu,et al.  Machine Learning in Mass Spectrometric Analysis of DIA Data , 2020, Proteomics.

[16]  Bing Zhang,et al.  Deep Learning in Proteomics , 2020, Proteomics.

[17]  Ronghui Lou,et al.  Hybrid Spectral Library Combining DIA-MS Data and a Targeted Virtual Library Substantially Deepens the Proteome Coverage , 2020, iScience.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[20]  Christoph B. Messner,et al.  DIA-NN: Neural networks and interference correction enable deep proteome coverage in high throughput , 2019, Nature Methods.

[21]  Ngoc Hieu Tran,et al.  Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry , 2018, Nature Methods.

[22]  Navdeep Jaitly,et al.  Peptide-Spectra Matching from Weak Supervision , 2018, 1808.06576.

[23]  Arnaud Droit,et al.  Extensive and accurate benchmarking of DIA acquisition methods and software tools using a complex proteomic standard , 2020, bioRxiv.

[24]  Magnus Palmblad,et al.  A Thousand and One Software for Proteomics: Tales of the Toolmakers of Science. , 2019, Journal of proteome research.

[25]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[26]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[27]  Bernhard Hemmer,et al.  Robust, reproducible and quantitative analysis of thousands of proteomes by micro-flow LC–MS/MS , 2020, Nature Communications.

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Bing Zhang,et al.  Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis , 2020, Nature Communications.

[30]  Hanno Steen,et al.  PIQED: automated identification and quantification of protein modifications from DIA-MS data , 2017, Nature Methods.

[31]  Christopher D. Brown,et al.  A Quantitative Proteome Map of the Human Body , 2019, Cell.

[32]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33]  Predrag Radivojac,et al.  On the accuracy and limits of peptide fragmentation spectrum prediction. , 2011, Analytical chemistry.

[34]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[35]  Xiaojing Wang,et al.  PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations , 2019, Genome research.

[36]  Jun Ye,et al.  DeepRT: deep learning for peptide retention time prediction in proteomics , 2017 .

[37]  Stan Z. Li,et al.  Phenotype Classification using Proteome Data in a Data-Independent Acquisition Tensor Format. , 2020, Journal of the American Society for Mass Spectrometry.

[38]  Ching-Tai Chen,et al.  MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks , 2019, BMC Genomics.

[39]  Extensive and Accurate Benchmarking of DIA Acquisition Methods and Software Tools Using a Complex Proteomic Standard. , 2021, Journal of proteome research.

[40]  Ying Xu,et al.  Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. , 2006, Analytical chemistry.

[41]  Dániel Szabó,et al.  Collision energies on QTof and Orbitrap instruments: How to make proteomics measurements comparable? , 2020, Journal of mass spectrometry : JMS.

[42]  Ben C. Collins,et al.  Quantitative proteomics: challenges and opportunities in basic and applied research , 2017, Nature Protocols.

[43]  Maximilian T. Strauss,et al.  Deep learning the collisional cross sections of the peptide universe from a million experimental values , 2021, Nature Communications.

[44]  Jürgen Cox,et al.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis , 2019, Nature Methods.

[45]  Matthew The,et al.  Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics , 2016, Bioinform..

[46]  Karina D. Sørensen,et al.  An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes , 2017, Cell systems.

[47]  Lennart Martens,et al.  The Age of Data‐Driven Proteomics: How Machine Learning Enables Novel Workflows , 2020, Proteomics.

[48]  J. Meyer,et al.  Quantitative Shotgun Proteome Analysis by Direct Infusion , 2020, Nature Methods.

[49]  P. Lasch,et al.  Isolation Window Optimization of Data-Independent Acquisition Using Predicted Libraries for Deep and Accurate Proteome Profiling. , 2020, Analytical chemistry.

[50]  Ming Li,et al.  DeepIso: A Deep Learning Model for Peptide Feature Detection from LC-MS map , 2017, Scientific Reports.

[51]  Kristian E. Swearingen,et al.  Generating high quality libraries for DIA MS with empirically corrected peptide predictions , 2020, Nature Communications.

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  S. Guan,et al.  Prediction of LC-MS/MS Properties of Peptides from Sequence by Deep Learning. , 2019, Molecular & cellular proteomics : MCP.

[54]  Tao Liu,et al.  Liquid Chromatography-Mass Spectrometry-based Quantitative Proteomics* , 2011, The Journal of Biological Chemistry.

[55]  Anthony Gitter,et al.  Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests , 2019, J. Chem. Inf. Model..

[56]  Oliver M. Bernhardt,et al.  Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues* , 2015, Molecular & Cellular Proteomics.

[57]  Junyu Dong,et al.  Learning and Transferring Convolutional Neural Network Knowledge to Ocean Front Recognition , 2017, IEEE Geoscience and Remote Sensing Letters.

[58]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[59]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[60]  Xiaohui Liu,et al.  In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics , 2020, Nature Communications.

[61]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[62]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[63]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[64]  Chunwei Ma DeepQuality: Mass Spectra Quality Assessment via Compressed Sensing and Deep Learning , 2017 .

[65]  Matthias Mann,et al.  Parallel Accumulation-Serial Fragmentation (PASEF): Multiplying Sequencing Speed and Sensitivity by Synchronized Scans in a Trapped Ion Mobility Device. , 2015, Journal of proteome research.

[66]  Elizabeth Guruceaga,et al.  DeepMSPeptide: peptide detectability prediction using deep learning , 2019, Bioinform..

[67]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[68]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[69]  William Stafford Noble,et al.  Direct Maximization of Protein Identifications from Tandem Mass Spectra* , 2011, Molecular & Cellular Proteomics.

[70]  Chunjie Luo,et al.  pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. , 2017, Analytical chemistry.

[71]  Huanming Yang,et al.  Improved Peptide Retention Time Prediction in Liquid Chromatography through Deep Learning. , 2018, Analytical chemistry.

[72]  Benjamin A. Neely Cloudy with a Chance of Peptides: Accessibility, Scalability, and Reproducibility with Cloud-Hosted Environments. , 2021, Journal of proteome research.

[73]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[74]  Haixu Tang,et al.  Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network. , 2020, Analytical chemistry.

[75]  Hao Chi,et al.  MS/MS Spectrum Prediction for Modified Peptides Using pDeep2 Trained by Transfer Learning. , 2019, Analytical chemistry.

[76]  Stevo Bozinovski,et al.  Reminder of the First Paper on Transfer Learning in Neural Networks, 1976 , 2020, Informatica.

[77]  Jürgen Cox,et al.  Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data , 2018, Annual Review of Biomedical Data Science.

[78]  George C Tseng,et al.  Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns. , 2005, Analytical chemistry.

[79]  Baozhen Shan,et al.  De novo peptide sequencing by deep learning , 2017, Proceedings of the National Academy of Sciences.

[80]  Maximilian T. Strauss,et al.  Deep learning the collisional cross sections of the peptide universe from a million training samples , 2020, bioRxiv.

[81]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[82]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[83]  S. Degroeve,et al.  DeepLC can predict retention times for peptides that carry as-yet unseen modifications , 2020, Nature Methods.

[84]  Vivien Marx When computational pipelines go ‘clank’ , 2020, Nature Methods.

[85]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[86]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[87]  Akshi Kumar,et al.  Machine Learning from Theory to Algorithms: An Overview , 2018, Journal of Physics: Conference Series.

[88]  Susan Cheng,et al.  Deep Neural Networks for Classification of LC-MS Spectral Peaks. , 2019, Analytical chemistry.

[89]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[90]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[91]  M. Savitski,et al.  Thermal proteome profiling: unbiased assessment of protein state through heat-induced stability changes , 2016, Proteome Science.