Partial least squares discriminant analysis: taking the magic away

Partial least squares discriminant analysis (PLS‐DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS‐DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS‐DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS‐DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[2]  Maria E. Holmboe,et al.  Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles , 2009, Metabolomics.

[3]  Hans J. Vogel,et al.  Quantitative analysis of metabolite concentrations in human urine samples using 13C{1H} NMR spectroscopy , 2009, Metabolomics.

[4]  R. Brereton,et al.  Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure , 2009 .

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  R. Brereton Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data , 2006 .

[7]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[8]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[9]  J Gottfries,et al.  Diagnosis of dementias using partial least squares discriminant analysis. , 1995, Dementia.

[10]  J. Friedman Regularized Discriminant Analysis , 1989 .

[11]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[12]  K. Miller On the Inverse of the Sum of Matrices , 1981 .

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  P. Mahalanobis On the generalized distance in statistics , 1936 .