Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry

Reversed-phase liquid chromatography (RPLC) and capillary zone electrophoresis (CZE) are two popular proteoform separation methods in mass spectrometry (MS)-based top-down proteomics. The prediction of proteoform retention time in RPLC and migration time in CZE provides additional information that can increase the accuracy of proteoform identification and quantification. Whereas existing methods for retention and migration time prediction are mainly focused on peptides in bottom-up MS, there is still a lack of methods for the problem in top-down MS. We systematically evaluated 6 models for proteoform retention and/or migration time prediction in top-down MS and showed that the Prosit model achieved a high accuracy (R2 > 0.91) for proteoform retention time prediction and that the Prosit model and a fully connected neural network model obtained a high accuracy (R2 > 0.94) for proteoform migration time prediction.

[1]  Elijah N. McCool,et al.  Recent advances (2019-2021) of capillary electrophoresis-mass spectrometry for multilevel proteomics. , 2021, Mass spectrometry reviews.

[2]  Jesse G. Meyer,et al.  Deep learning neural network tools for proteomics , 2021, Cell reports methods.

[3]  Bing Zhang,et al.  Deep Learning in Proteomics , 2020, Proteomics.

[4]  Bo Wen,et al.  Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis , 2020, Nature Communications.

[5]  S. Degroeve,et al.  DeepLC can predict retention times for peptides that carry as-yet unseen modifications , 2020, Nature Methods.

[6]  Elijah N. McCool,et al.  Predicting electrophoretic mobility of proteoforms for large-scale top-down proteomics. , 2020, Analytical chemistry.

[7]  John R. Yates,et al.  EThcD and 213 nm UVPD for Top-Down Analysis of Bovine Seminal Plasma Proteoforms on Electrophoretic and Chromatographic Time Frames. , 2020, Analytical chemistry.

[8]  Si Wu,et al.  High-throughput quantitative top-down proteomics. , 2020, Molecular omics.

[9]  Pengyuan Yang,et al.  In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics , 2020, Nature Communications.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  J. Cox,et al.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis , 2019, Nature Methods.

[12]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[13]  Liangliang Sun,et al.  Identification and Quantification of Proteoforms by Mass Spectrometry , 2019, Proteomics.

[14]  Abdul Rehman Basharat,et al.  Large-Scale Qualitative and Quantitative Top-Down Proteomics Using Capillary Zone Electrophoresis-Electrospray Ionization-Tandem Mass Spectrometry with Nanograms of Proteome Samples , 2019, Journal of The American Society for Mass Spectrometry.

[15]  Huanming Yang,et al.  Improved Peptide Retention Time Prediction in Liquid Chromatography through Deep Learning. , 2018, Analytical chemistry.

[16]  Lloyd M. Smith,et al.  Proteoforms as the next proteomics currency , 2018, Science.

[17]  Liangliang Sun,et al.  Single-Shot Top-Down Proteomics with Capillary Zone Electrophoresis-Electrospray Ionization-Tandem Mass Spectrometry for Identification of Nearly 600 Escherichia coli Proteoforms. , 2017, Analytical chemistry.

[18]  Lukas Käll,et al.  Peptide retention time prediction. , 2017, Mass spectrometry reviews.

[19]  Jungkap Park,et al.  Informed-Proteomics: Open Source Software Package for Top-down Proteomics , 2017, Nature Methods.

[20]  Paul D Piehowski,et al.  High-resolution ultrahigh-pressure long column reversed-phase liquid chromatography for top-down proteomics. , 2017, Journal of chromatography. A.

[21]  Ying Ge,et al.  Top-Down Proteomics of Large Proteins up to 223 kDa Enabled by Serial Size Exclusion Chromatography Strategy. , 2017, Analytical chemistry.

[22]  Neil L. Kelleher,et al.  Top-down proteomics: Where we are, where we are going? , 2017, Journal of proteomics.

[23]  N. Dovichi,et al.  Predicting Electrophoretic Mobility of Tryptic Peptides for High-Throughput CZE-MS Analysis. , 2017, Analytical chemistry.

[24]  Xiaowen Liu,et al.  A mass graph‐based approach for the identification of modified proteoforms using top‐down tandem mass spectra , 2016, Bioinform..

[25]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[26]  Matthew The,et al.  Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics , 2016, Bioinform..

[27]  Qiang Kou,et al.  TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization , 2016, Bioinform..

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Ying Ge,et al.  Three dimensional liquid chromatography coupling ion exchange chromatography/hydrophobic interaction chromatography/reverse phase chromatography for effective protein separation in top-down proteomics. , 2015, Analytical chemistry.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Liangliang Sun,et al.  Over 10,000 peptide identifications from the HeLa proteome by using single-shot capillary zone electrophoresis combined with tandem mass spectrometry. , 2014, Angewandte Chemie.

[32]  N. Kelleher,et al.  Top Down proteomics: facts and perspectives. , 2014, Biochemical and biophysical research communications.

[33]  S Roberts,et al.  Gaussian processes for time-series modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  John Chilton,et al.  Using iRT, a normalized retention time for more targeted measurement of peptides , 2012, Proteomics.

[35]  Joseph M. Foster,et al.  Chromatographic retention time prediction for posttranslationally modified peptides , 2012, Proteomics.

[36]  A. Capriotti,et al.  Intact protein separation by chromatographic and/or electrophoretic techniques for top-down proteomics. , 2011, Journal of chromatography. A.

[37]  Richard D. LeDuc,et al.  Mapping Intact Protein Isoforms in Discovery Mode Using Top Down Proteomics , 2011, Nature.

[38]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[39]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[40]  V. Spicer,et al.  Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides. , 2009, Analytical chemistry.

[41]  Oliver Kohlbacher,et al.  Improving peptide identification in proteome analysis by a two-dimensional retention time filtering approach. , 2009, Journal of proteome research.

[42]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[43]  A. Guttman,et al.  Mobility modeling of peptides in capillary electrophoresis , 2008 .

[44]  Oliver Kohlbacher,et al.  Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics , 2007, BMC Bioinformatics.

[45]  Roman Kaliszan,et al.  QSRR: quantitative structure-(chromatographic) retention relationships. , 2007, Chemical reviews.

[46]  O. Krokhin,et al.  Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. , 2006, Analytical chemistry.

[47]  R. Beavis,et al.  An Improved Model for Prediction of Retention Times of Tryptic Peptides in Ion Pair Reversed-phase HPLC , 2004, Molecular & Cellular Proteomics.

[48]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[49]  D. Lubman,et al.  Electrophoretic mobility for peptides with post‐translational modifications in capillary electrophoresis , 2003, Electrophoresis.

[50]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[51]  E. Reynolds,et al.  Rules relating electrophoretic mobility, charge and molecular size of peptides and proteins. , 1997, Journal of chromatography. B, Biomedical sciences and applications.

[52]  J. Yates,et al.  Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. , 1997, Analytical chemistry.

[53]  C. Mant,et al.  Reversed-phase chromatography of synthetic amphipathic alpha-helical peptides as a model for ligand/receptor interactions. Effect of changing hydrophobic environment on the relative hydrophilicity/hydrophobicity of amino acid side-chains. , 1994, Journal of chromatography. A.

[54]  H. Lauer,et al.  A semiempirical model for the electrophoretic mobilities of peptides in free-solution capillary electrophoresis. , 1989, Analytical biochemistry.

[55]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[56]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.