Predicting Structural Motifs of Glycosaminoglycans using Cryogenic Infrared Spectroscopy and Random Forest

In recent years, glycosaminoglycans (GAGs) have emerged into the focus of biochemical and biomedical research due to their importance in a variety of physiological processes. These molecules show great diversity, which makes their analysis highly challenging. A promising tool for identifying the structural motifs and conformation of shorter GAG chains is cryogenic gas-phase infrared (IR) spectroscopy. In this work, the cryogenic gas-phase IR spectra of mass-selected heparan sulfate (HS) di-, tetra-, and hexasaccharide ions were recorded to extract vibrational features that are characteristic to structural motifs. The data were augmented with chondroitin sulfate (CS) disaccharide spectra to assemble a training library for random forest (RF) classifiers. These were used to discriminate between GAG classes (CS or HS) and different sulfate positions (2-O-, 4-O-, 6-O-, and N-sulfation). With optimized data preprocessing and RF modeling, a prediction accuracy of >97% was achieved for HS tetra- and hexasaccharides based on a training set of only 21 spectra. These results exemplify the importance of combining gas-phase cryogenic IR ion spectroscopy with machine learning to improve the future analytical workflow for GAG sequencing and that of other biomolecules, such as metabolites.

[1]  K. Pagel,et al.  Gas-phase infrared spectroscopy of glycans and glycoconjugates. , 2021, Current opinion in structural biology.

[2]  K. Pagel,et al.  State-of-the-art glycosaminoglycan characterization. , 2021, Mass spectrometry reviews.

[3]  K. Pagel,et al.  Mass Spectrometry-Based Techniques to Elucidate the Sugar Code , 2021, Chemical reviews.

[4]  K. Pagel,et al.  Chondroitin Sulfate Disaccharides in the Gas Phase: Differentiation and Conformational Constraints , 2021, The journal of physical chemistry. A.

[5]  I. Amster,et al.  Developments in Mass Spectrometry for Glycosaminoglycan Analysis: A Review , 2020, Molecular & cellular proteomics : MCP.

[6]  Tsai-Jung Wu,et al.  Dissecting the conformation of glycans and their interactions with proteins , 2020, Journal of biomedical science.

[7]  S. Gewinner,et al.  Probing the conformational landscape and thermochemistry of DNA dinucleotide anions via helium nanodroplet infrared action spectroscopy. , 2020, Physical chemistry chemical physics : PCCP.

[8]  K. Pagel,et al.  Cryogenic Infrared Spectroscopy Reveals Structural Modularity in the Vibrational Fingerprints of Heparan Sulfate Diastereomers. , 2020, Analytical chemistry.

[9]  H. Kulik,et al.  Machine Learning in Chemistry , 2020, ACS In Focus.

[10]  P. Tyler,et al.  Shotgun ion mobility mass spectrometry sequencing of heparan sulfate saccharides , 2020, Nature Communications.

[11]  Anand A. Rajasekar,et al.  Spectral deep learning for prediction and prospective validation of functional groups , 2020, Chemical science.

[12]  D. Filippov,et al.  Characterization of glycosyl dioxolenium ions and their role in glycosylation reactions , 2019, Nature Communications.

[13]  B. Melchers,et al.  Curta: A General-purpose High-Performance Computer at ZEDAT, Freie Universität Berlin , 2020 .

[14]  P. Guttmann,et al.  Optical Nanosensing of Lipid Accumulation due to Enzyme Inhibition in Live Cells. , 2019, ACS nano.

[15]  Kerry A. Naish,et al.  A practical introduction to Random Forest for genetic association studies in ecology and evolution , 2018, Molecular ecology resources.

[16]  Christian Manz,et al.  Glycan analysis by ion mobility-mass spectrometry and gas-phase spectroscopy. , 2018, Current opinion in chemical biology.

[17]  Michael Z. Kamrath,et al.  Cryogenic Vibrational Spectroscopy Provides Unique Fingerprints for Glycan Identification , 2017, Journal of The American Society for Mass Spectrometry.

[18]  Weston B Struwe,et al.  Glycan Fingerprinting via Cold-Ion Infrared Spectroscopy. , 2017, Angewandte Chemie.

[19]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[20]  J. Oomens,et al.  IRMPD Spectroscopy Sheds New (Infrared) Light on the Sulfate Pattern of Carbohydrates. , 2017, The journal of physical chemistry. A.

[21]  J. Behler,et al.  Machine learning molecular dynamics for the simulation of infrared spectra , 2017, Chemical science.

[22]  A. Brodbelt,et al.  Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology. , 2016, The Analyst.

[23]  S. Gewinner,et al.  IR spectroscopy of protonated leu-enkephalin and its 18-crown-6 complex embedded in helium droplets. , 2015, Physical chemistry chemical physics : PCCP.

[24]  B. Ripley Classification and Regression Trees , 2015 .

[25]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2015, Natural Computing Series.

[26]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[27]  P. Tyler,et al.  Synthesis of a targeted library of heparan sulfate hexa- to dodecasaccharides as inhibitors of β-secretase: potential therapeutics for Alzheimer's disease. , 2013, Chemistry.

[28]  J. Zaia,et al.  Disaccharide analysis of glycosaminoglycans using hydrophilic interaction chromatography and mass spectrometry. , 2013, Analytical chemistry.

[29]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[30]  Donnell R. Christian,et al.  Chromatography and Mass Spectrometry , 2012 .

[31]  Cha Zhang,et al.  Ensemble Machine Learning , 2012 .

[32]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  G. Meijer,et al.  Catching proteins in liquid helium droplets. , 2010, Physical review letters.

[35]  J. Turnbull,et al.  Modular synthesis of heparan sulfate oligosaccharides for structure-activity relationship studies. , 2009, Journal of the American Chemical Society.

[36]  Richard D Cummings,et al.  Symbol nomenclature for glycan representation , 2009, Proteomics.

[37]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[38]  Jos Oomens,et al.  Proton affinity and zwitterion stability: new results from infrared spectroscopy and theory of cationized lysine and analogues in the gas phase. , 2009, The journal of physical chemistry. A.

[39]  Neha S. Gandhi,et al.  The Structure of Glycosaminoglycans and their Interactions with Proteins , 2008, Chemical biology & drug design.

[40]  Nick C. Polfer,et al.  Infrared spectroscopy of cationized lysine and epsilon-N-methyllysine in the gas phase: effects of alkali-metal ion size and proton affinity on zwitterion stability. , 2007, The journal of physical chemistry. A.

[41]  Bjoern H Menze,et al.  Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy , 2007, Analytical and bioanalytical chemistry.

[42]  Zixiang Xiong,et al.  Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[43]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[44]  R. Linhardt,et al.  Role of glycosaminoglycans in cellular communication. , 2004, Accounts of chemical research.

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[47]  J. Peter-Katalinic,et al.  Structural characterization of chondroitin/dermatan sulfate oligosaccharides from bovine aorta by capillary electrophoresis and electrospray ionization quadrupole time-of-flight tandem mass spectrometry. , 2002, Rapid communications in mass spectrometry : RCM.

[48]  U. Lindahl,et al.  Glycosaminoglycans and the regulation of blood coagulation. , 1993, The Biochemical journal.