Multiclass partial least squares discriminant analysis: Taking the right way-A critical tutorial

Journal of Chemometrics. 2018;32:e3030. https://doi.org/10.1002/cem.3030 ay c op y an d di st rib ut e th is a rti cl e fo r y ou Abstract Here, the theory of the multi‐class partial least squares discriminant analysis (PLS‐DA) is presented. A distinct feature of this theory is that it does not utilize PLS scores but is entirely based on the predicted dummy responses. It is shown that the results of the multi‐class PLS‐DA can be presented in a straightforward way by projecting the response matrix on the “super‐score” space by means of principal component analysis. Two approaches to discrimination are considered: the hard and the soft way of allocation. Correspondingly, 2 versions of PLS‐DA are presented: the conventional hard PLS‐DA, and the newly introduced soft PLS‐DA that seems to be a novel approach in chemometrics. The quality of classification is assessed using the figures of merit (sensitivity, specificity, and efficiency). It is shown how these characteristics are used for the selection of the model complexity. A number of practical problems are investigated, such as unbalanced sizes of classes, comparison of the discriminant and the class‐modeling methods and authentication by the “one against all” strategy. The paper is illustrated by real‐world and simulated examples.

[1]  M. Forina,et al.  Class-modeling techniques, classic and new, for old and new problems , 2008 .

[2]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[3]  A. Pomerantsev Acceptance areas for multivariate classification derived by projection methods , 2008 .

[4]  Alexey L. Pomerantsev,et al.  Chemometrics in Excel: Pomerantsev/Chemometrics in Excel , 2014 .

[5]  M Daszykowski,et al.  The Monte Carlo validation framework for the discriminant partial least squares model extended with variable selection methods applied to authenticity studies of Viagra® based on chromatographic impurity profiles. , 2016, The Analyst.

[6]  A. Pomerantsev,et al.  Concept and role of extreme objects in PCA/SIMCA , 2014 .

[7]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[8]  Ricard Boqué,et al.  Multi-class classification with probabilistic discriminant partial least squares (p-DPLS). , 2010, Analytica chimica acta.

[9]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[10]  R. Brereton,et al.  Partial least squares discriminant analysis: taking the magic away , 2014 .

[11]  Hicham Noçairi,et al.  Discrimination on latent components with respect to patterns. Application to multicollinear data , 2005, Comput. Stat. Data Anal..

[12]  Jânio Sousa Santos,et al.  Authentication of juices from antioxidant and chemical perspectives: A feasibility quality control study using chemometrics , 2017 .

[13]  M. P. Callao,et al.  Partial least squares density modeling (PLS-DM) - a new class-modeling strategy applied to the authentication of olives in brine by near-infrared spectroscopy. , 2014, Analytica chimica acta.

[14]  A L Pomerantsev,et al.  Quantitative risk assessment in classification of drugs with identical API content. , 2014, Journal of pharmaceutical and biomedical analysis.

[15]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[16]  Oxana Ye. Rodionova,et al.  On the type II error in SIMCA method , 2014 .

[17]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[18]  T. Næs,et al.  From dummy regression to prior probabilities in PLS‐DA , 2007 .

[19]  Paolo Oliveri,et al.  Multivariate class modeling for the verification of food-authenticity claims , 2012 .

[20]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[21]  S. Bijlsma,et al.  In search of secreted protein biomarkers for the anti-inflammatory effect of beta2-adrenergic receptor agonists: application of DIGE technology in combination with multivariate and univariate data analysis tools. , 2005, Journal of proteome research.

[22]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[23]  Oxana Ye. Rodionova,et al.  Rigorous and compliant approaches to one-class classification , 2016 .

[24]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[25]  Daniel Granato,et al.  Characterization of conventional, biodynamic, and organic purple grape juices by chemical markers, antioxidant capacity, and instrumental taste profile. , 2015, Journal of food science.

[26]  Oxana Ye. Rodionova,et al.  Discriminant analysis is an inappropriate method of authentication , 2016 .