Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics

Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure–Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson’s correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.

[1]  C. Steinbeck,et al.  The Chemistry Development Kit (CDK): An Open‐Source Java Library for Chemo‐ and Bioinformatics. , 2003 .

[3]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[4]  Adrian D Hegeman,et al.  A study on retention "projection" as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments. , 2011, Journal of chromatography. A.

[5]  D. Wishart Advances in metabolite identification. , 2011, Bioanalysis.

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  Oliver Fiehn,et al.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry , 2007, BMC Bioinformatics.

[8]  Søren Brunak,et al.  Prediction methods and databases within chemoinformatics : Emphasis on drugs and drug candidates , 2005 .

[9]  S. Kanaya,et al.  Summary , 1940, Intellectual Property in the Conflict of Laws.

[10]  Ralf J. M. Weber,et al.  Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics , 2012, Metabolomics.

[11]  Koji Kadota,et al.  HPLC Retention time prediction for metabolome analysi , 2010, Bioinformation.

[12]  DuPan,et al.  Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching , 2006 .

[13]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[14]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[15]  K. Fraser,et al.  Non-targeted analysis of tea by hydrophilic interaction liquid chromatography and high resolution mass spectrometry. , 2012, Food chemistry.

[16]  K. Fraser,et al.  E/Z-Thesinine-O-4'-alpha-rhamnoside, pyrrolizidine conjugates produced by grasses (Poaceae). , 2008, Phytochemistry.

[17]  John B. Fenn,et al.  Electrospray ionization–principles and practice , 1990 .

[18]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[19]  A. Pelander,et al.  Prediction of liquid chromatographic retention for differentiation of structural isomers. , 2012, Analytica chimica acta.

[20]  Karl Fraser,et al.  Computational Analyses of Spectral Trees from Electrospray Multi-Stage Mass Spectrometry to Aid Metabolite Identification , 2013, Metabolites.

[21]  Ian D Wilson,et al.  Hydrophilic interaction chromatography coupled to MS for metabonomic/metabolomic studies. , 2010, Journal of separation science.

[22]  Gary Siuzdak,et al.  Liquid chromatography quadrupole time-of-flight mass spectrometry characterization of metabolites guided by the METLIN database , 2013, Nature Protocols.

[23]  Oliver Fiehn,et al.  Advances in structure elucidation of small molecules using mass spectrometry , 2010, Bioanalytical reviews.

[24]  K. Héberger Quantitative structure-(chromatographic) retention relationships. , 2007, Journal of chromatography. A.

[25]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[26]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[27]  Philip Britz-McKibbin,et al.  New advances in separation science for metabolomics: resolving chemical diversity in a post-genomic era. , 2013, Chemical reviews.

[28]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[29]  Yuan Zhao,et al.  Computation of Octanol-Water Partition Coefficients by Guiding an Additive Model with Knowledge , 2007, J. Chem. Inf. Model..

[30]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[31]  Oliver Fiehn,et al.  Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm , 2006, BMC Bioinformatics.

[32]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[33]  Jahan B Ghasemi,et al.  A quantitative structure- property relationship of gas chromatographic/mass spectrometric retention data of 85 volatile organic compounds as air pollutant materials by multivariate methods , 2012, Chemistry Central Journal.

[34]  Wanchang Lin,et al.  Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules' , 2009, BMC Bioinformatics.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[37]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[38]  L. Lai,et al.  Calculating partition coefficient by atom-additive method , 2000 .

[39]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[40]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[41]  J. Meek Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[42]  D L Massart,et al.  Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure-retention relationship studies. , 2003, Journal of chromatography. A.

[43]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[44]  J. Thomas-Oates,et al.  Metabolomic applications of HILIC-LC-MS. , 2010, Mass spectrometry reviews.

[45]  R. Breitling,et al.  Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. , 2011, Analytical chemistry.

[46]  Luhua Lai,et al.  A New Atom-Additive Method for Calculating Partition Coefficients , 1997, J. Chem. Inf. Comput. Sci..