VSN: Variable sorting for normalization

Spectrometric and analytical techniques in general collect multivariate signals from chemical or biological materials by means of a specific measurement instrumentation, usually in order to characterize or classify them through the estimation of one of several compounds of interest. However, measurement conditions might induce various additive (baseline) or multiplicative effects on the collected signals, which may jeopardize the accuracy and generalizability of estimation models. A common way of dealing with such issues is signal normalization and in particular, when the baseline is constant, the standard normal variate (SNV) transform. Despite its efficiency, SNV has important drawbacks, in terms of physical interpretation and robustness of estimation models, because all the variables are equally considered, independently on what their actual relationship with the response(s) of interest is. In the present study, a novel algorithm is proposed, named variable sorting for normalization (VSN). This algorithm automatically produces, for a given set of multivariate signals, a weighting function favoring signal variables that are only impacted by additive and multiplicative effects, and not by the response(s) of interest. When introduced in SNV preprocessing, this weighting function significantly improves signal shape and model interpretation. Moreover, VSN can be successfully used not only for constant but also with more complex baselines, such as polynomial ones. Together with the description of the theory behind VSN, its application on various synthetic multivariate data, as well as on real SWIR spectral data, is presented and discussed.

[1]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[2]  Jean-Michel Roger,et al.  Comparison of the efficacy of spectral pre-treatments for wheat and weed discrimination in outdoor conditions , 2014 .

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  T. Næs,et al.  The Effect of Multiplicative Scatter Correction (MSC) and Linearity Improvement in NIR Spectroscopy , 1988 .

[5]  Robert Andersen Modern Methods for Robust Regression , 2007 .

[6]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[7]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[8]  T. Fearn,et al.  On the geometry of SNV and MSC , 2009 .

[9]  K. Sjoedin Minimizing effects of closure on analytical data , 1984 .

[10]  Christian Germain,et al.  Near infrared hyperspectral dataset of healthy and infected apple tree leaves images for the early detection of apple scab disease , 2017, Data in brief.

[11]  Reduction of error propagation due to normalization: Effect of error propagation and closure on spurious correlations , 1995 .

[12]  Thomas A. Blake,et al.  Application of extended inverse scatter correction to mid‐infrared reflectance spectra of soil , 2005 .

[13]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[14]  H. Martens,et al.  Extended Multiplicative Signal Correction as a Tool for Separation and Characterization of Physical and Chemical Information in Fourier Transform Infrared Microscopy Images of Cryo-Sections of Beef Loin , 2005, Applied spectroscopy.

[15]  H. Martens,et al.  Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. , 1991, Journal of pharmaceutical and biomedical analysis.

[16]  Desire L. Massart,et al.  The robust normal variate transform for pattern recognition with near-infrared data , 1999 .

[17]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[18]  Li Yang,et al.  Revised Kubelka-Munk theory. III. A general theory of light propagation in scattering and absorptive media. , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[19]  B Walczak,et al.  What can go wrong at the data normalization step for identification of biomarkers? , 2014, Journal of chromatography. A.

[20]  Mattias Rantalainen,et al.  Normalization and Closure , 2009 .