Preprocessing of NMR metabolomics data

Abstract Metabolomics involves the large scale analysis of metabolites and thus, provides information regarding cellular processes in a biological sample. Independently of the analytical technique used, a vast amount of data is always acquired when carrying out metabolomics studies; this results in complex datasets with large amounts of variables. This type of data requires multivariate statistical analysis for its proper biological interpretation. Prior to multivariate analysis, preprocessing of the data must be carried out to remove unwanted variation such as instrumental or experimental artifacts. This review aims to outline the steps in the preprocessing of NMR metabolomics data and describe some of the methods to perform these. Since using different preprocessing methods may produce different results, it is important that an appropriate pipeline exists for the selection of the optimal combination of methods in the preprocessing workflow.

[1]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[2]  J. Mandel Use of the Singular Value Decomposition in Regression Analysis , 1982 .

[3]  K. Schmidt-Rohr,et al.  High-Resolution NMR Techniques for Solids , 1994 .

[4]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[5]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[6]  V. Preedy,et al.  Biomarkers in Kidney Disease , 2016, Biomarkers in Disease: Methods, Discoveries and Applications.

[7]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[8]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[9]  Sara Weiss,et al.  Metabolomics In Practice Successful Strategies To Generate And Analyze Metabolic Data , 2016 .

[10]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[11]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[12]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[13]  Kevin J Johnson,et al.  Classification of gasoline data obtained by gas chromatography using a piecewise alignment algorithm combined with feature selection and principal component analysis. , 2005, Journal of chromatography. A.

[14]  Shengfa Miao,et al.  Automatic baseline correction of strain gauge signals , 2015 .

[15]  O. Kvalheim,et al.  Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , 2009 .

[16]  Age K. Smilde,et al.  Data-processing strategies for metabolomics studies , 2011 .

[17]  Yizeng Liang,et al.  GC–MS Based Serum Metabolomic Analysis of Isoflurane-Induced Postoperative Cognitive Dysfunctional Rats: Biomarker Screening and Insight into Possible Pathogenesis , 2012, Chromatographia.

[18]  Søren Balling Engelsen,et al.  High-throughput cereal metabolomics: Current analytical technologies, challenges and perspectives , 2014 .

[19]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[20]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[21]  Wen Wu,et al.  Peak Alignment of Urine NMR Spectra Using Fuzzy Warping , 2006, J. Chem. Inf. Model..

[22]  H. Vogel,et al.  Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. , 2013, American journal of respiratory and critical care medicine.

[23]  Harald Martens,et al.  Reducing over-optimism in variable selection by cross-model validation , 2006 .

[24]  M. Sarma,et al.  Brain MR Spectroscopy In Vivo: Basics and Quantitation of Metabolites , 2013 .

[25]  Kristian Hovde Liland,et al.  Multivariate methods in metabolomics – from pre-processing to dimension reduction and statistical analysis , 2011 .

[26]  L. Muftuler,et al.  Quantifying Morphology and Physiology of the Human Body Using MRI , 2013 .

[27]  Ute Roessner,et al.  Metabolome Analysis: An Introduction , 2007 .

[28]  A. Fernandez-Gutiérrez,et al.  Ultra high performance liquid chromatography-time of flight mass spectrometry for analysis of avocado fruit metabolites: method evaluation and applicability to the analysis of ripening degrees. , 2011, Journal of chromatography. A.

[29]  Gerhard Wider,et al.  Elimination of baseline artifacts in NMR spectra by oversampling , 1990 .

[30]  Tanaka The role of , 2000, Journal of insect physiology.

[31]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[32]  K. Schmidt-Rohr,et al.  Multidimensional Solid-State Nmr and Polymers , 1994 .

[33]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[34]  R. Beynon,et al.  Metabolomics as a diagnostic tool for hepatology: validation in a naturally occurring canine model , 2005, Metabolomics.

[35]  J. Mo,et al.  Baseline correction by improved iterative polynomial fitting with automatic threshold , 2006 .

[36]  S. Wijmenga,et al.  NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. , 2012, Analytica chimica acta.

[37]  Yufeng J Tseng,et al.  Distribution-based classification method for baseline correction of metabolomic 1D proton nuclear magnetic resonance spectra. , 2013, Analytical chemistry.

[38]  S. Manahan Toxicological chemistry and biochemistry , 1988 .

[39]  U. Roessner,et al.  Metabolomics in Functional Genomics and Systems Biology , 2006 .

[40]  Manuel Martín-Pastor,et al.  A new general-purpose fully automatic baseline-correction procedure for 1D and 2D NMR data. , 2006, Journal of magnetic resonance.

[41]  J. Klawitter,et al.  The Role of Metabolomics in the Study of Kidney Diseases and in the Development of Diagnostic Tools , 2017 .

[42]  R. Kaddurah-Daouk,et al.  High-Performance Liquid Chromatography Separations Coupled With Coulometric Electrode Array Detectors , 2007 .

[43]  Daniel Raftery,et al.  Comparing and combining NMR spectroscopy and mass spectrometry in metabolomics , 2007, Analytical and bioanalytical chemistry.

[44]  Neil E. Jacobsen,et al.  NMR spectroscopy explained : simplified theory, applications and examples for organic chemistry and structural biology , 2007 .

[45]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[46]  F. J. Holler,et al.  Principles of Instrumental Analysis , 1973 .

[47]  Kristian Hovde Liland,et al.  Optimal Choice of Baseline Correction for Multivariate Calibration of Spectra , 2010, Applied spectroscopy.

[48]  Kashif Ali,et al.  NMR‐Based Metabolomics Analysis , 2013 .

[49]  David M. Rocke,et al.  Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis , 2008, BMC Bioinformatics.

[50]  S. Mohamed,et al.  Statistical Normalization and Back Propagation for Classification , 2022 .

[51]  A. Bax,et al.  Baseline correction of 2D FT NMR spectra using a simple linear prediction extrapolation of the time-domain data , 1989 .

[52]  R. Kaddurah-Daouk,et al.  High-performance liquid chromatography separations coupled with coulometric electrode array detectors: a unique approach to metabolomics. , 2007, Methods in molecular biology.

[53]  Abraham Nyska,et al.  Discovery of Metabolomics Biomarkers for Early Detection of Nephrotoxicity , 2009, Toxicologic pathology.

[54]  R. H. Jellema,et al.  2.06 – Variable Shift and Alignment , 2009 .

[55]  A. Heuer,et al.  A new method for suppressing baseline distortions in FT NMR , 1989 .

[56]  K. Laukens,et al.  Getting Your Peaks in Line: A Review of Alignment Methods for NMR Spectral Data , 2013, Metabolites.

[57]  B. Daviss Growing pains for metabolomics , 2005 .

[58]  L. Buydens,et al.  Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. , 2013, Analytica chimica acta.

[59]  W. Dietrich,et al.  Fast and precise automatic baseline correction of one- and two-dimensional nmr spectra , 1991 .

[60]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[61]  Andreas Bender,et al.  Understanding and Classifying Metabolite Space and Metabolite-Likeness , 2011, PloS one.

[62]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[63]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[64]  Royston Goodacre,et al.  Metabolomics: Current technologies and future trends , 2006, Proteomics.

[65]  John C. Lindon,et al.  The handbook of metabonomics and metabolomics , 2007 .

[66]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[67]  L. Buydens,et al.  Alignment of high resolution magic angle spinning magnetic resonance spectra using warping methods. , 2010, Analytica chimica acta.