Partial Least Squares Methods for Non-Metric Data

Partial Least Squares (PLS) methods embrace a suite of data analysis techniques based on algorithms belonging to PLS family. These algorithms consist in various extensions of the Nonlinear estimation by Iterative PArtial Least Squares (NIPALS) algorithm, which was proposed by Herman Wold as an alternative algorithm for implementing a Principal Component Analysis. The peculiarity of this algorithm is that it calculates principal components by means of an iterative sequence of simple ordinary least squares regressions. This feature allows overcoming computational problems due to missing data or landscape data matrices, i.e. matrix having more columns than rows. PLS methods were born to handle data sets forming metric spaces. This involves that all the variables embedded in the analysis are observed on interval or ratio scales. In this work we evidenced how NIPALS based algorithms, properly adjusted, can work as optimal scaling algorithms. This new feature of PLS, which had been until now totally unexplored, allowed us to device a new suite of PLS methods: the Non-Metric PLS (NM-PLS) methods. NM-PLS methods can be used with different aims: - to analyze at the same time variables observed on different measurement scales; - to investigate non linearity; - to discard the hard assumption of linearity in favor of a milder assumption of monotonicity. In particular, these methods generalize standard NIPALS, PLS Regression and PLS Path Modeling in such a way to handle variables observed on a variety of measurement scales, as well as to cope with non linearity problems. Three new algorithms are been proposed to implement NM-PLS methods: the Non-Metric NIPALS algorithm, the Non-Metric PLS Regression algorithm, and the Non-Metric PLS Path Modeling algorithm. All these algorithms provide at the same time specific PLS model parameters as well as scaling values for variables to be scaled. Scaling values provided by these algorithms are been proved to be optimal, in the sense that they optimize the same criterion of the model in which they are involved. Moreover, they are suitable, since they respect the constraints depending on which among the properties of the original measurement scale we want to preserve.

[1]  George Leitmann,et al.  Optimization techniques, with applications to aerospace systems , 1964 .

[2]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[3]  Herman Wold,et al.  Soft modelling: The Basic Design and Some Extensions , 1982 .

[4]  P. Horst Measuring Complex Attitudes , 1935 .

[5]  Vincenzo Esposito Vinzi,et al.  PLS Path Modeling: From Foundations to Recent Developments and Open Issues for Model Assessment and Improvement , 2010 .

[6]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .

[7]  J. Leeuw,et al.  The Gifi system of descriptive multivariate analysis , 1998 .

[8]  F. Bookstein,et al.  Two Structural Equation Models: LISREL and PLS Applied to Consumer Exit-Voice Theory: , 1982 .

[9]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[10]  C. Burt THE FACTORIAL ANALYSIS OF QUALITATIVE DATA , 1950 .

[11]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[12]  R. Fisher THE PRECISION OF DISCRIMINANT FUNCTIONS , 1940 .

[13]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[14]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[15]  E. Martin,et al.  Non-linear projection to latent structures revisited: the quadratic PLS algorithm , 1999 .

[16]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[17]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[18]  F. Bookstein,et al.  Neurobehavioral effects of prenatal alcohol: Part II. Partial least squares analysis. , 1989, Neurotoxicology and teratology.

[19]  L. Guttman The principal components of scale analysis , 1950 .

[20]  R. Shepard,et al.  A nonmetric variety of linear factor analysis , 1974 .

[21]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[22]  A. Z. Israels,et al.  Redundancy analysis for qualitative variables , 1984 .

[23]  Michel Tenenhaus La r?gression PLS: th?orie et pratique , 1998 .

[24]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[25]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[26]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[27]  Forrest W. Young,et al.  Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features , 1976 .

[28]  K. Jöreskog A general method for analysis of covariance structures , 1970 .

[29]  Jan-Bernd Lohmöller,et al.  Latent Variable Path Modeling with Partial Least Squares , 1989 .

[30]  Wynne W. Chin The partial least squares approach for structural equation modeling. , 1998 .

[31]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[32]  Forrest W. Young,et al.  Nonmetric Common Factor Analysis: An Alternating Least Squares Method with Optimal Scaling Features , 1979 .

[33]  S. Wold,et al.  The GIFI approach to non‐linear PLS modeling , 2001 .

[34]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[35]  E. Greenleaf Improving Rating Scale Measures by Detecting and Correcting Bias Components in Some Response Styles , 1992 .

[36]  M. W. Richardson,et al.  Making a rating scale that measures. , 1933 .

[37]  Chikio Hayashi On the quantification of qualitative data from the mathematico-statistical point of view , 1950 .

[38]  Harald Martens,et al.  Analysis of genetic marker-phenotype relationships by jack-knifed partial least squares regression (PLSR). , 2004, Hereditas.

[39]  L. L. Thurstone,et al.  The theory of multiple factors , 1933 .

[40]  Gastón Sánchez Trujillo Pathmox approach: segmentation trees in partial least squares path modeling , 2009 .

[41]  C. Fornell,et al.  Evaluating structural equation models with unobservable variables and measurement error. , 1981 .

[42]  Looking at the Antecedents of Perceived Switching Costs. A PLS Path Modeling Approach with Categorical Indicators , 2005 .

[43]  Kwok-fai Ting,et al.  CONFIRMATORY TETRAD ANALYSIS , 1993 .

[44]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[45]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .

[46]  Forrest W. Young,et al.  The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features , 1978 .

[47]  Myrtille Vivien,et al.  Approches PLS linéaires et non linéaires pour la modélisation de multi-tableaux. Théorie et applications , 2002 .

[48]  Forrest W. Young Quantitative analysis of qualitative data , 1981 .

[49]  Thomas J. McAvoy,et al.  Nonlinear PLS Modeling Using Neural Networks , 1992 .

[50]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[51]  J. Meullenet,et al.  A PLS dummy variable approach to assess the impact of jar attributes on liking , 2006 .

[52]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[53]  Charles E. Heckler,et al.  Correspondence Analysis and Data Coding With Java and R , 2007, Technometrics.

[54]  E. S. Knowles,et al.  Acquiescent Responding in Self-Reports: Cognitive Style or Social Concern? ☆ ☆☆ ★ , 1997 .

[55]  Heikki Haario,et al.  Nonlinear data analysis. II. Examples on new link functions and optimization aspects , 1994 .

[56]  Michel Tenenhaus,et al.  PLS generalised linear regression , 2005, Comput. Stat. Data Anal..

[57]  Robert Sabatier,et al.  Additive splines for partial least squares regression , 1997 .

[58]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[59]  Mohamed Hanafi,et al.  PLS Path modelling: computation of latent variables with the estimation mode B , 2007, Comput. Stat..

[60]  Michel Tenenhaus,et al.  PLS path modeling , 2005, Comput. Stat. Data Anal..

[61]  R. J. Ball,et al.  The Significance of Simultaneous Methods of Parameter Estimation in Econometric Models , 1963 .

[62]  P. J. Ferrando The impact of social desirability bias on the EPQ-R item scores: An item response theory analysis , 2008 .

[63]  Shizuhiko Nishisato,et al.  Nonlinear programming approach to optimal scaling of partially ordered categories , 1975 .

[64]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[65]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[66]  Veli-Matti Taavitsainen,et al.  Nonlinear data analysis with latent variables , 1992 .

[67]  Agnar Höskuldsson,et al.  Quadratic PLS regression , 1992 .

[68]  S. Wold,et al.  Nonlinear PLS modeling , 1989 .

[69]  Pietro Giorgio Lovaglio La stima di variabili latenti da variabili osservate miste , 2007 .

[70]  D. Andrich A rating formulation for ordered response categories , 1978 .

[71]  A Proposal for Handling Categorical Predictors in PLS Regression Framework , 2011 .

[72]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[73]  Siegfried P. Gudergan,et al.  Confirmatory Tetrad Analysis in PLS Path Modeling , 2008 .

[74]  Lori Rothman,et al.  Just-About-Right (JAR) Scales: Design, Usage, Benefits, and Risks , 2009 .

[75]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[76]  S. Geisser A predictive approach to the random effect model , 1974 .

[77]  W. J. Duncan,et al.  Elementary matrices and some applications to dynamics and differential equations , 1939 .

[78]  S. de Jong PLS shrinks , .

[79]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[80]  V. E. Vinzi,et al.  A global Goodness – of – Fit index for PLS structural equation modelling 1 , 2004 .

[81]  Nicole Krämer,et al.  Analysis of High Dimensional Data with Partial Least Squares and Boosting , 2007 .

[82]  R. Shepard Metric structures in ordinal data , 1966 .

[83]  Michel Tenenhaus,et al.  Analyse en composantes principales d'un ensemble de variables nominales ou numériques , 1977 .

[84]  Russell L. Rouseff,et al.  Relating Descriptive Sensory Analysis to Gas Chromatography/Olfactometry Ratings of Fresh Strawberries Using Partial Least Squares Regression , 2006 .

[85]  M. Greenacre Correspondence analysis in practice , 1993 .

[86]  J. Durand,et al.  Local polynomial additive regression through PLS and splines: PLSS , 2001 .

[87]  I. E. Frank A nonlinear PLS model , 1990 .

[88]  西里 静彦,et al.  Analysis of categorical data : dual scaling and its applications , 1980 .

[89]  S. Wold,et al.  INLR, implicit non‐linear latent variable regression , 1997 .

[90]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[91]  R. P. McDonald,et al.  Structural Equations with Latent Variables , 1989 .

[92]  Forrest W. Young,et al.  Additive structure in qualitative data: An alternating least squares method with optimal scaling features , 1976 .

[93]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[94]  Chikio Hayashi On the prediction of phenomena from qualitative data and the quantification of qualitative data from the mathematico-statistical point of view , 1951 .

[95]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[96]  Christian Derquenne,et al.  A modified PLS path modeling algorithm handling reflective categorical variables and a new model building strategy , 2007, Comput. Stat. Data Anal..

[97]  Forrest W. Young Methods for describing ordinal data with cardinal models , 1975 .

[98]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[99]  Michel Tenenhaus,et al.  A Bridge Between PLS Path Modeling and Multi-Block Data Analysis , 2010 .

[100]  A. Höskuldsson PLS regression methods , 1988 .

[101]  L. Guttman A general nonmetric technique for finding the smallest coordinate space for a configuration of points , 1968 .

[102]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[103]  L. Guttman,et al.  The Quantification of a class of attributes : A theory and method of scale construction , 1941 .

[104]  K. Keniston,et al.  Yeasayers and naysayers: agreeing response set as a personality variable. , 1960, Journal of abnormal and social psychology.

[105]  Y. Tanaka,et al.  Review of the methods of quantification. , 1979, Environmental health perspectives.