Extension of SO-PLS to multi-way arrays: SO-N-PLS

Abstract Multi-way data arrays are becoming more common in several fields of science. For instance, analytical instruments can sometimes collect signals at different modes simultaneously, as e.g. fluorescence and LC/GC-MS. Higher order data can also arise from sensory science, were product scores can be reported as function of sample, judge and attribute. Another example is process monitoring, where several process variables can be measured over time for several batches. In addition, so-called multi-block data sets where several blocks of data explain the same set of samples are becoming more common. Several methods exist for analyzing either multi-way or multi-block data, but there has been little attention on methods that combine these two data properties. A common procedure is to “unfold” multi-way arrays in order to obtain two-way data tables on which classical multi-block methods can be applied. However, it is a known fact that unfolding can lead to overfitted models due to increased flexibility in parameter estimation. In this paper we present a novel multi-block regression method that can handle multi-way data blocks. This method is a combination of a multi-block method called Sequential and Orthogonalized-PLS (SO-PLS) and the multi-way version of PLS, N-PLS. The new method is therefore called SO-N-PLS. We have compared the method to Multi-block-PLS (MB-PLS) and SO-PLS on unfolded data. We investigate the hypotheses that SO-N-PLS has better performances on small data sets and noisy data, and that SO-N-PLS models are easier to interpret. The hypotheses are investigated by a simulation study and two real data examples; one dealing with regression and one with classification. The simulation study show that SO-N-PLS predicts better than the unfolded methods when the sample size is small and the data is noisy. This is due to the fact that it filters out the noise better than MB-PLS and SO-PLS. For the real data examples, the differences in prediction are small but the multi-way method allows easier interpretation.

[1]  R. Bro Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis , 1999 .

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  T. Næs,et al.  Path modelling by sequential PLS regression , 2011 .

[4]  Rasmus Bro,et al.  Structure-revealing data fusion , 2014, BMC Bioinformatics.

[5]  A. Agresti,et al.  Multiway Data Analysis , 1989 .

[6]  Tormod Næs,et al.  Chemometrics in foodomics: Handling data structures from multiple analytical platforms , 2014 .

[7]  José Camacho,et al.  Bilinear modeling of batch processes. Part III: parameter stability , 2014 .

[8]  R. Bro Multiway calibration. Multilinear PLS , 1996 .

[9]  J Moan,et al.  Active photosensitizers in butter detected by fluorescence spectroscopy and multivariate curve resolution. , 2006, Journal of agricultural and food chemistry.

[10]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[11]  Tommy Löfstedt,et al.  OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation , 2011 .

[12]  Tormod Næs,et al.  A unified description of classical classification methods for multicollinear data , 1998 .

[13]  Ulf G. Indahl,et al.  A twist to partial least squares regression , 2005 .

[14]  Sijmen de Jong,et al.  Regression coefficients in multilinear PLS , 1998 .

[15]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[16]  Svante Wold,et al.  Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection , 1996 .

[17]  Andrea Marchetti,et al.  A mid level data fusion strategy for the Varietal Classification of Lambrusco PDO wines , 2014 .

[18]  T. Næs,et al.  A comparison of methods for analysing regression models with both spectral and designed variables , 2004 .

[19]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[20]  Tormod Næs,et al.  Combining SO-PLS and linear discriminant analysis for multi-block classification , 2015 .

[21]  Tamara G. Kolda,et al.  All-at-once Optimization for Coupled Matrix and Tensor Factorizations , 2011, ArXiv.

[22]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[23]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[24]  Hilko van der Voet,et al.  Comparing the predictive accuracy of models using a simple randomization test , 1994 .

[25]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[26]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[27]  Tormod Næs,et al.  Preference mapping by PO-PLS: Separating common and unique information in several data blocks , 2012 .

[28]  Bruce R. Kowalski,et al.  prediction of wine quality and geographic origin from chemical measurements by parital least-squares regression modeling , 1984 .