Better interpretable models after correcting for natural variation : Residual approaches examined

Abstract The interpretation of estimates of model parameters in terms of biological information is often just as important as the predictions of the model itself. In this study we consider the identification of metabolites in a possibly biologically heterogeneous case group that show abnormal patterns with respect to a set of (healthy) control observations. For this purpose, we filter normal (baseline) natural variation from the data by projection of the data on a control sample model: the residual approach. This step should more easily highlight the abnormal metabolites. Interpretation is, however, hindered by a problem we named the ‘residual bias’ effect, which may lead to the identification of the wrong metabolites as ‘abnormal’. This effect is related to the smearing effect. We propose to alleviate residual bias by considering a weighted average of the filtered and raw data. This way, a compromise is found between excluding irrelevant natural variation from the data and the amount of residual bias that occurs. We show for simulated and real-world examples that this compromise may outperform inspection of the raw or filtered data. The method holds promise in numerous applications such as disease diagnoses, personalized healthcare, and industrial process control.

[1]  D. Wishart Metabolomics: applications to food science and nutrition research , 2008 .

[2]  Age K. Smilde,et al.  Generalized contribution plots in multivariate statistical process monitoring , 2000 .

[3]  Lutgarde M. C. Buydens,et al.  Projected Orthogonalized CHemical Encounter MONitoring (POCHEMON) for microbial interactions in co-culture , 2014, Metabolomics.

[4]  G. Anderson,et al.  Endotyping asthma: new insights into key pathogenic mechanisms in a complex, heterogeneous disease , 2008, The Lancet.

[5]  T. Hankemeier,et al.  Metabolomics-based systems biology and personalized medicine: moving towards n = 1 clinical trials? , 2006, Pharmacogenomics.

[6]  Eun Sug Park,et al.  Comparing a new algorithm with the classic methods for estimating the number of factors , 1999 .

[7]  Lorraine Brennan,et al.  Dietary intake patterns are reflected in metabolomic profiles: potential role in dietary assessment studies. , 2011, The American journal of clinical nutrition.

[8]  Luigi Atzori,et al.  Statistical Health Monitoring Applied to a Metabolomic Study of Experimental Hepatocarcinogenesis: An Alternative Approach to Supervised Methods for the Identification of False Positives. , 2016, Analytical chemistry.

[9]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[10]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[11]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[12]  Simon P. B. Ovenden,et al.  A Study of the Metabolome of Ricinus communis for Forensic Applications , 2010 .

[13]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[14]  D. T. Witte,et al.  Net analyte signal based statistical quality control. , 2005, Analytical chemistry.

[15]  Jef Vanlaer,et al.  Contribution plots for Statistical Process Control: Analysis of the smearing-out effect , 2013, 2013 European Control Conference (ECC).

[16]  Lutgarde M. C. Buydens,et al.  Towards the Disease Biomarker in an Individual Patient Using Statistical Health Monitoring , 2014, PloS one.

[17]  Romà Tauler,et al.  A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB , 2005 .

[18]  Mark R. Viant,et al.  Environmental metabolomics: a critical review and future perspectives , 2009, Metabolomics.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Peter de B Harrington,et al.  Baseline correction method using an orthogonal basis for gas chromatography/mass spectrometry data. , 2011, Analytical chemistry.

[21]  Sebastiano Collino,et al.  Clinical metabolomics paves the way towards future healthcare strategies. , 2013, British journal of clinical pharmacology.

[22]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[23]  Edoardo Saccenti,et al.  Individual human phenotypes in metabolic space and time. , 2009, Journal of proteome research.