OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation

This paper presents a new multiblock analysis method called OnPLS, a general extension of O2PLS to the multiblock case. The proposed method is equivalent to O2PLS in cases involving only two matrices, but generalises to cases involving more than two matrices without giving preference to any particular matrix: the method is fully symmetric. OnPLS extracts a minimal number of globally predictive components that exhibit maximal covariance and correlation. Furthermore, the method can be used to study orthogonal variation, i.e. local phenomena captured in the data that are specific to individual combinations of matrices or to individual matrices. The method's utility was demonstrated by its application to three synthetic data sets. It was shown that OnPLS affords a reduced number of globally predictive components and increased intercorrelations of scores, and that it greatly facilitates interpretation of the predictive model. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[2]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[3]  B. Flury,et al.  Two generalizations of the common principal component model , 1987 .

[4]  Henri S. Tapp,et al.  OPLS filtered data can be obtained directly from non‐orthogonalized PLS1 , 2009 .

[5]  Rolf Ergon PLS post‐processing by similarity transformation (PLS + ST): a simple alternative to OPLS , 2005 .

[6]  Honglu Yu,et al.  Post processing methods (PLS–CCA): simple alternatives to preprocessing methods (OSC–PLS) , 2004 .

[7]  Paul Horst,et al.  Factor analysis of data matrices , 1965 .

[8]  H. Wold Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[9]  Pekka Teppola,et al.  Wavelets for scrutinizing multivariate exploratory models— interpreting models through multiresolution analysis , 2001 .

[10]  Mohamed Hanafi,et al.  Analysis of K sets of data, with differential emphasis on agreement between and within sets , 2006, Comput. Stat. Data Anal..

[11]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[12]  Agnar Höskuldsson,et al.  Prediction Methods in Science and Technology.: Vol 1. Basic theory , 1996 .

[13]  Tarja Rajalahti,et al.  X‐tended target projection (XTP)—comparison with orthogonal partial least squares (OPLS) and PLS post‐processing by similarity transformation (PLS + ST) , 2009 .

[14]  V. E. Vinzi,et al.  PLS regression, PLS path modeling and generalized Procrustean analysis: a combined approach for multiblock analysis , 2005 .

[15]  A. Höskuldsson Variable and subset selection in PLS regression , 2001 .

[16]  J. Berge,et al.  Generalized approaches to the maxbet problem and the maxdiff problem, with applications to canonical correlations , 1988 .

[17]  B. Flury Common Principal Components in k Groups , 1984 .

[18]  Svante Wold,et al.  Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection , 1996 .

[19]  Robert G. D. Steel,et al.  Minimum Generalized Variance for a set of Linear Functions , 1951 .

[20]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[21]  L. Tucker An inter-battery method of factor analysis , 1958 .

[22]  Michel Tenenhaus,et al.  PLS path modeling , 2005, Comput. Stat. Data Anal..

[23]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[24]  Paul Horst,et al.  Relations amongm sets of measures , 1961 .

[25]  Mohamed Hanafi,et al.  Global optimality of the successive Maxbet algorithm , 2003 .

[26]  Johan Trygg,et al.  Integrated analysis of transcript, protein and metabolite data to study lignin biosynthesis in hybrid aspen. , 2009, Journal of proteome research.

[27]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[28]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[29]  P. Horst Generalized canonical correlations and their applications to experimental data. , 1961, Journal of clinical psychology.

[30]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[32]  Age K. Smilde,et al.  Direct orthogonal signal correction , 2001 .

[33]  J. Trygg O2‐PLS for qualitative and quantitative analysis in multivariate calibration , 2002 .

[34]  H. Hotelling The most predictable criterion. , 1935 .

[35]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[36]  H. Wold Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments , 1973 .

[37]  Olof Svensson,et al.  An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra , 1998 .

[38]  Tom Fearn,et al.  Orthogonal Signal Correction , 1999 .

[39]  Philippe Casin,et al.  A generalization of principal component analysis to K sets of variables , 2001 .

[40]  Iven Van Mechelen,et al.  UvA-DARE ( Digital Academic Repository ) A structured overview of simultaneous component based data integration , 2009 .

[41]  Michel Tenenhaus,et al.  A Bridge Between PLS Path Modeling and Multi-Block Data Analysis , 2010 .

[42]  A. Höskuldsson PLS regression methods , 1988 .

[43]  J. Geer Linear relations amongk sets of variables , 1984 .

[44]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .