Hot PLS—a framework for hierarchically ordered taxonomic classification by partial least squares

Abstract A novel framework for classification by partial least squares in a fixed hierarchy is presented. The hierarchical approach ensures flexible local modelling with varying complexity. It results in an intuitive classification path from the highest taxonomic levels down to species and beyond. Results are presented as phylogenetic trees with local diagnostic information to gain maximum information about the classification and help the researcher to focus on interesting phenomena. Information on sample replicates is included in the classification to increase performance and avoid misclassifications due to low quality measurements. Detection of samples coming from previously unobserved classes is enabled by estimating cut-off distances from the calibration data classes. To further increase flexibility and improve customization the canonical powered partial least squares algorithm is used for modelling and classification together with linear discriminant analysis. This opens up for additional sample response information and forced sharpening of focus on important variables. The presented framework is not limited to biological taxonomy, but was first developed for this purpose.

[1]  H. Martens,et al.  Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. , 1991, Journal of pharmaceutical and biomedical analysis.

[2]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[3]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[4]  Tormod Næs,et al.  Multi-block regression based on combinations of orthogonalisation, PLS-regression and canonical correlation analysis , 2013 .

[5]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[6]  H. Martens,et al.  Reducing Inter-Replicate Variation in Fourier Transform Infrared Spectroscopy by Extended Multiplicative Signal Correction , 2009, Applied spectroscopy.

[7]  Hicham Noçairi,et al.  Discrimination on latent components with respect to patterns. Application to multicollinear data , 2005, Comput. Stat. Data Anal..

[8]  Trygve Almøy,et al.  ST‐PLS: a multi‐directional nearest shrunken centroid type classifier via PLS , 2008 .

[9]  T. Næs,et al.  Canonical partial least squares—a unified PLS approach to classification and regression problems , 2009 .

[10]  A. Kohler,et al.  Characterization of food spoilage fungi by FTIR spectroscopy , 2013, Journal of applied microbiology.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[13]  Roberto Todeschini,et al.  Linear discriminant hierarchical clustering: A modeling and cross-validable divisive clustering method , 1993 .

[14]  Kristian Hovde Liland,et al.  Distribution based truncation for variable selection in subspace methods for multivariate regression , 2013 .

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Peter J. Hunter,et al.  Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models , 2011, BMC Systems Biology.

[17]  T. Næs,et al.  From dummy regression to prior probabilities in PLS‐DA , 2007 .

[18]  K. Liland,et al.  An Extension of PPLS-DA for Classification and Comparison to Ordinary PLS-DA , 2013, PloS one.

[19]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Kristian Hovde Liland,et al.  Powered partial least squares discriminant analysis , 2009 .

[22]  Ulf G. Indahl,et al.  The geometry of PLS1 explained properly: 10 key notes on mathematical properties of and some alternative algorithmic approaches to PLS1 modelling , 2014 .

[23]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .