Scale-Invariant Biomarker Discovery in Urine and Plasma Metabolite Fingerprints.

Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such scaling of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways. Here, we studied how both the outcome of hypothesis tests for differential metabolite concentration and the screening for multivariate metabolite signatures are affected by the choice of scale. To overcome this problem for metabolite signatures and to establish a scale-invariant biomarker discovery algorithm, we extended linear zero-sum regression to the logistic regression framework and showed in two applications to 1H NMR-based metabolomics data how this approach overcomes the scaling problem. Logistic zero-sum regression is available as an R package as well as a high-performance computing implementation that can be downloaded at https://github.com/rehbergT/zeroSum .

[1]  T. Ebbels,et al.  Analytical reproducibility in (1)H NMR-based metabonomic urinalysis. , 2002, Chemical research in toxicology.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Sunduz Keles,et al.  Sparse Partial Least Squares Classification for High Dimensional Data , 2010, Statistical applications in genetics and molecular biology.

[4]  Wolfram Gronwald,et al.  Urinary metabolite quantification employing 2D NMR spectroscopy. , 2008, Analytical chemistry.

[5]  Rainer Spang,et al.  Reference point insensitive molecular data analysis , 2017, Bioinform..

[6]  Alfred Ross,et al.  Chapter 3 – NMR Spectroscopy Techniques for Application to Metabonomics , 2007 .

[7]  G. Curhan Cystatin C: a marker of renal function or something more? , 2005, Clinical chemistry.

[8]  Wolfram Gronwald,et al.  Detection of autosomal dominant polycystic kidney disease by NMR spectroscopic fingerprinting of urine. , 2011, Kidney international.

[9]  Mark R Viant,et al.  An NMR metabolomic investigation of early metabolic disturbances following traumatic brain injury in a mammalian model , 2005, NMR in biomedicine.

[10]  Michael L. Raymer,et al.  Dynamic adaptive binning: an improved quantification technique for NMR spectroscopic data , 2011, Metabolomics.

[11]  A. Arduini,et al.  Carnitine in metabolic disease: potential for pharmacological intervention. , 2008, Pharmacology & therapeutics.

[12]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[13]  Sushrut S Waikar,et al.  Normalization of urinary biomarkers to creatinine during changes in glomerular filtration rate. , 2010, Kidney international.

[14]  P. Elliott,et al.  Urinary metabolic signatures of human adiposity , 2015, Science Translational Medicine.

[15]  Michael L. Turner,et al.  The influence of scaling metabolomics data on model classification accuracy , 2015, Metabolomics.

[16]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[17]  Rainer Spang,et al.  Molecular signatures that can be transferred across different omics platforms , 2017, Bioinform..

[18]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[19]  L. Stevens,et al.  Measured GFR as a confirmatory test for estimated GFR. , 2009, Journal of the American Society of Nephrology : JASN.

[20]  H. U. Zacharias,et al.  Performance evaluation of algorithms for the classification of metabolic 1H NMR fingerprints. , 2012, Journal of proteome research.

[21]  Rainer Spang,et al.  Data Normalization of (1)H NMR Metabolite Fingerprinting Data Sets in the Presence of Unbalanced Metabolite Regulation. , 2015, Journal of proteome research.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Elena Tsiporkova,et al.  NMR-based characterization of metabolic alterations in hypertension using an adaptive, intelligent binning algorithm. , 2008, Analytical chemistry.

[24]  A. Mastalerz-Migas,et al.  Serum and urine metabolomic fingerprinting in diagnostics of inflammatory bowel diseases. , 2014, World journal of gastroenterology.

[25]  Bruce D. Hammock,et al.  Metabolomics: building on a century of biochemistry to guide human health , 2005, Metabolomics.

[26]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[27]  P. Prenzler,et al.  Recent and potential developments in the analysis of urine: a review. , 2011, Analytica chimica acta.

[28]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[29]  Simon Tavaré,et al.  Normalization of metabolomics data with applications to correlation maps , 2014, Bioinform..

[30]  R. Spang,et al.  State-of-the art data normalization methods improve NMR-based metabolomic analysis , 2011, Metabolomics.

[31]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[32]  Claudio Ronco,et al.  Clinical review: RIFLE and AKIN – time for reappraisal , 2009, Critical care.

[33]  Wolfram Gronwald,et al.  Current Experimental, Bioinformatic and Statistical Methods used in NMR Based Metabolomics , 2013 .

[34]  Wolfram Gronwald,et al.  Analysis of human urine reveals metabolic changes related to the development of acute kidney injury following cardiac surgery , 2012, Metabolomics.

[35]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[36]  H. U. Zacharias,et al.  Identification of Plasma Metabolites Prognostic of Acute Kidney Injury after Cardiac Surgery with Cardiopulmonary Bypass. , 2015, Journal of proteome research.

[37]  E. Saccenti Correlation Patterns in Experimental Data Are Affected by Normalization Procedures: Consequences for Data Analysis and Network Inference. , 2017, Journal of proteome research.