Retention Time Prediction Improves Identification in Nontargeted Lipidomics Approaches.

Identification of lipids in nontargeted lipidomics based on liquid-chromatography coupled to mass spectrometry (LC-MS) is still a major issue. While both accurate mass and fragment spectra contain valuable information, retention time (tR) information can be used to augment this data. We present a retention time model based on machine learning approaches which enables an improved assignment of lipid structures and automated annotation of lipidomics data. In contrast to common approaches we used a complex mixture of 201 lipids originating from fat tissue instead of a standard mixture to train a support vector regression (SVR) model including molecular structural features. The cross-validated model achieves a correlation coefficient between predicted and experimental test sample retention times of r = 0.989. Combining our retention time model with identification via accurate mass search (AMS) of lipids against the comprehensive LIPID MAPS database, retention time filtering can significantly reduce the rate of false positives in complex data sets like adipose tissue extracts. In our case, filtering with retention time information removed more than half of the potential identifications, while retaining 95% of the correct identifications. Combination of high-precision retention time prediction and accurate mass can thus significantly narrow down the number of hypotheses to be assessed for lipid identification in complex lipid pattern like tissue profiles.

[1]  M. Wenk The emerging field of lipidomics , 2005, Nature Reviews Drug Discovery.

[2]  Gunnar Rätsch,et al.  Exploiting physico-chemical properties in string kernels , 2010, BMC Bioinformatics.

[3]  I. Laakso,et al.  Analysis of fatty acids by gas chromatography, and its relevance to research on health and nutrition , 2002 .

[4]  M. Teitell,et al.  Turnover of nonessential fatty acids in cardiolipin from the rat heart , 2011, Journal of Lipid Research.

[5]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[6]  Guowang Xu,et al.  The lipid profile of brown adipose tissue is sex-specific in mice. , 2014, Biochimica et biophysica acta.

[7]  Rosaria Costa,et al.  Acquisition of deeper knowledge on the human plasma fatty acid profile exploiting comprehensive 2-D GC. , 2008, Journal of separation science.

[8]  Dora Fix Ventura,et al.  Solid-phase microextraction combined with comprehensive two-dimensional gas chromatography for fatty acid profiling of cell wall phospholipids. , 2012, Journal of separation science.

[9]  Knut Reinert,et al.  LC-MSsim – a simulation software for liquid chromatography mass spectrometry data , 2008, BMC Bioinformatics.

[10]  Eoin Fahy,et al.  LIPID MAPS online tools for lipid research , 2007, Nucleic Acids Res..

[11]  H. Vandendool,et al.  A GENERALIZATION OF THE RETENTION INDEX SYSTEM INCLUDING LINEAR TEMPERATURE PROGRAMMED GAS-LIQUID PARTITION CHROMATOGRAPHY. , 1963, Journal of chromatography.

[12]  O. Kohlbacher,et al.  A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data , 2007, Nucleic acids research.

[13]  Oliver Kohlbacher,et al.  Improving peptide identification in proteome analysis by a two-dimensional retention time filtering approach. , 2009, Journal of proteome research.

[14]  Juan Antonio Vizcaíno,et al.  Shorthand notation for lipid structures derived from mass spectrometry , 2013, Journal of Lipid Research.

[15]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[16]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[17]  R. Breitling,et al.  Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. , 2011, Analytical chemistry.

[18]  K. Héberger Quantitative structure-(chromatographic) retention relationships. , 2007, Journal of chromatography. A.

[19]  E. Kováts,et al.  Gas‐chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone , 1958 .

[20]  Lukas Käll,et al.  Training, selection, and robust calibration of retention time models for targeted proteomics. , 2010, Journal of proteome research.

[21]  Thomas Hankemeier,et al.  Analytical strategies in lipidomics and applications in disease biomarker discovery. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[22]  Oliver Fiehn,et al.  LipidBlast - in-silico tandem mass spectrometry database for lipid identification , 2013, Nature Methods.

[23]  Frank David,et al.  Comprehensive blood plasma lipidomics by liquid chromatography/quadrupole time-of-flight mass spectrometry. , 2010, Journal of chromatography. A.

[24]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[25]  S. Neumann,et al.  Metabolite profiling and beyond: approaches for the rapid processing and annotation of human blood serum mass spectrometry data , 2013, Analytical and Bioanalytical Chemistry.

[26]  Yixiao Shen,et al.  An improved GC-MS method in determining glycerol in different types of biological samples. , 2013, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[27]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[28]  David W. Russell,et al.  LMSD: LIPID MAPS structure database , 2006, Nucleic Acids Res..

[29]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[30]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[31]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[32]  O. Fiehn,et al.  FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. , 2009, Analytical chemistry.

[33]  Xianlin Han,et al.  Global analyses of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry: a bridge to lipidomics. , 2003, Journal of lipid research.

[34]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[35]  E. Fukusaki,et al.  Development of a lipid profiling system using reverse-phase liquid chromatography coupled to high-resolution mass spectrometry with rapid polarity switching and an automated lipid identification software. , 2013, Journal of chromatography. A.

[36]  S. Mongrand,et al.  Rapid nanoscale quantitative analysis of plant sphingolipid long-chain bases by GC-MS , 2012, Analytical and Bioanalytical Chemistry.

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38]  S. Nicholls,et al.  The emerging role of plasma lipidomics in cardiovascular drug discovery , 2012, Expert opinion on drug discovery.

[39]  Oliver Fiehn,et al.  Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm , 2006, BMC Bioinformatics.

[40]  Oliver Kohlbacher,et al.  Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics , 2007, BMC Bioinformatics.

[41]  Leon P Barron,et al.  Prediction of chromatographic retention time in high-resolution anti-doping screening data using artificial neural networks. , 2013, Analytical chemistry.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  R. Kennedy,et al.  Analysis of fatty acid composition in insulin secreting cells by comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[44]  Bettina M. Mayr,et al.  Structure-activity relationships in chromatography: retention prediction of oligonucleotides with support vector regression. , 2006, Angewandte Chemie.