Cage of covariance in calibration modeling: Regressing multiple and strongly correlated response variables onto a low rank subspace of explanatory variables

ABSTRACT: In analytical chemistry, multivariate calibration is applied when substituting a time-consuming reference measurement (based on e.g. chromatography) with a high-throughput measurement (based on e.g. vibrational spectroscopy). An average error term, of the response variable, is often used to evaluate the performance of a calibration model. However, indirect relationships, between the response and explanatory variables, may be used for calibration. In such cases, model validity cannot necessarily be determined solely by the average error term. One should also consider the use of the models as well as the validity of the indirect relationships in future samples. If the analyte of interest is partly quantified from signals of interfering compounds, then these interfering compounds will play a hidden role in the calibration. This hidden role may affect future use of the calibration model as strong covariance relationships between analyte estimates and interfering compounds may be imposed. Hence, such model cannot detect changes in the relationship between the analyte and interfering compounds. The problem is called the cage of covariance. This paper discusses the concept cage of covariance and possible consequences of applying models exposed to this issue.

[1]  H. Martens,et al.  Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. , 1991, Journal of pharmaceutical and biomedical analysis.

[2]  S. Engelsen,et al.  The spatial composition of porcine adipose tissue investigated by multivariate curve resolution of near infrared spectra: Relationships between fat, the degree of unsaturation and water* , 2017 .

[3]  H. J. Luinge,et al.  Determination of the fat, protein and lactose content of milk using Fourier transform infrared spectrometry , 1993 .

[4]  Visualizing indirect correlations when predicting fatty acid composition from near infrared spectroscopy measurements , 2019, Proceedings of the 18th International Conference on Near Infrared Spectroscopy.

[5]  P. Umaharan,et al.  Fast and neat--determination of biochemical quality parameters in cocoa using near infrared spectroscopy. , 2015, Food chemistry.

[6]  Achim Kohler,et al.  Extended multiplicative signal correction in vibrational spectroscopy, a tutorial , 2012 .

[7]  J. Steinier,et al.  Smoothing and differentiation of data by simplified least square procedure. , 1972, Analytical chemistry.

[8]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[9]  Bruce R. Kowalski,et al.  Tensorial calibration: I. First‐order calibration , 1988 .

[10]  Frans van den Berg,et al.  Prediction of total fatty acid parameters and individual fatty acids in pork backfat using Raman spectroscopy and chemometrics: Understanding the cage of covariance between highly correlated fat parameters. , 2016, Meat science.

[11]  S. Wold,et al.  A randomization test for PLS component selection , 2007 .

[12]  L. B. Larsen,et al.  Quantification of bovine milk protein composition and coagulation properties using infrared spectroscopy and chemometrics: A result of collinearity among reference variables. , 2016, Journal of dairy science.

[13]  Martin Andersson,et al.  A comparison of nine PLS1 algorithms , 2009 .

[14]  H. Martens,et al.  Genome-wide association mapping for milk fat composition and fine mapping of a QTL for de novo synthesis of milk fatty acids on bovine chromosome 13 , 2017, Genetics Selection Evolution.

[15]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[16]  L. B. Larsen,et al.  The influence of feed and herd on fatty acid composition in 3 dairy breeds (Danish Holstein, Danish Jersey, and Swedish Red). , 2012, Journal of dairy science.

[17]  B. Ruyter,et al.  Genetic effects of fatty acid composition in muscle of Atlantic salmon , 2018, Genetics Selection Evolution.

[18]  L. B. Larsen,et al.  Quantification of individual fatty acids in bovine milk by infrared spectroscopy and chemometrics: understanding predictions of highly collinear reference variables. , 2014, Journal of dairy science.

[19]  Søren Balling Engelsen,et al.  An On-Line Near-Infrared (NIR) Transmission Method for Determining Depth Profiles of Fatty Acid Composition and Iodine Value in Porcine Adipose Fat Tissue , 2012, Applied spectroscopy.