Model selection for within-batch effect correction in UPLC-MS metabolomics using quality control - Support vector regression.

Ultra performance liquid chromatography - mass spectrometry (UPLC-MS) is increasingly being used for untargeted metabolomics in biomedical research. Complex matrices and a large number of samples per analytical batch lead to gradual changes in the instrumental response (i.e. within-batch effects) that reduce the repeatability and reproducibility and limit the power to detect biological responses. A strategy for within-batch effect correction based on the use of quality control (QC) samples and Support Vector Regression (QC-SVRC) with a radial basis function kernel was recently proposed. QC-SVRC requires the optimization of three hyperparameters that determine the accuracy of the within-batch effects elimination: the tolerance threshold (ε), the penalty term (C) and the kernel width (γ). This work compares three widely used strategies for QC-SVRC hyperparameter optimization (grid search, random search and particle swarm optimization) using a UPLC-MS data set containing 193 urine injections as model example. Results show that QC-SVRC is robust to hyperparameter selection and that a pre-selection of C and ε, followed by optimization of γ is competitive in terms of accuracy, precision and number of function evaluations with full grid analysis, random search and particle swarm optimization. The QC-SVRC optimization procedure can be regarded as a useful non-parametric tool for efficiently complementing alternative approaches such as QC-robust splines correction (RSC).

[1]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[2]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[3]  X. Calvet,et al.  Metabolomic Analysis of Gastric Cancer Progression within the Correa's Cascade Using Ultraperformance Liquid Chromatography-Mass Spectrometry. , 2016, Journal of proteome research.

[4]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[5]  Coral Barbas,et al.  Controlling the quality of metabolomics data: new strategies to get the best out of the QC sample , 2015, Metabolomics.

[6]  Tomasz Burzykowski,et al.  Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. , 2013, Omics : a journal of integrative biology.

[7]  J A Kirwan,et al.  Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow , 2013, Analytical and Bioanalytical Chemistry.

[8]  Mir Henglin,et al.  Visualization, Quantification, and Alignment of Spectral Drift in Population Scale Untargeted Metabolomics Data. , 2017, Analytical chemistry.

[9]  Yudong Zhang,et al.  A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications , 2015 .

[10]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[11]  José Luis Rojo-Álvarez,et al.  Robust support vector regression for biophysical variable estimation from remotely sensed images , 2006, IEEE Geoscience and Remote Sensing Letters.

[12]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[13]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[14]  Mikael K. R. Engskog,et al.  LC–MS based global metabolite profiling: the necessity of high data quality , 2016, Metabolomics.

[15]  Lin Shi,et al.  Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction , 2016, Metabolomics.

[16]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[17]  Angela K. Boysen,et al.  Best-Matched Internal Standard Normalization in Liquid Chromatography-Mass Spectrometry Metabolomics Applied to Environmental Samples. , 2018, Analytical chemistry.

[18]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[19]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  T. Ebbels,et al.  Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. , 2012, Analytical chemistry.

[22]  Bart De Moor,et al.  Hyperparameter tuning in Python using Optunity , 2014 .

[23]  C. Kuo,et al.  Batch Normalizer: a fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. , 2013, Analytical chemistry.

[24]  J. Kuligowski,et al.  Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). , 2015, The Analyst.

[25]  H. Ressom,et al.  LC-MS-based metabolomics. , 2012, Molecular bioSystems.

[26]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[27]  Ron Wehrens,et al.  Improved batch correction in untargeted MS-based metabolomics , 2016, Metabolomics.

[28]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[29]  David Broadhurst,et al.  The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. , 2012, Bioanalysis.

[30]  Jasper Engel,et al.  Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling , 2016, Metabolomics.