A modification of canonical variates analysis to handle highly collinear multivariate data

A modification of the standard Canonical Variates Analysis (CVA) method to cope with collinear high‐dimensional data is developed. The method utilizes Partial Least Squares regression as an engine for solving an eigenvector problem involving singular covariance matrices. Three data sets are analyzed to demonstrate the properties of the method: a two‐group problem with near infrared spectroscopic data consisting of 60 samples and 376 variables, a multi‐group problem with fluorescence spectroscopic data (1023 variables) consisting of 83 samples from six groups and a three‐group problem with physical‐chemical data (10 variables) consisting of 41 samples from three groups. It is demonstrated that the modified CVA method forces the discriminative information into the first canonical variates as expected. The weight vectors found in the modified CVA method possess the same properties as weight vectors of the standard CVA method. By combination of the suggested method with, for example, Linear Discriminant Analysis (LDA) as a classifier, an operational tool for classification and discrimination of collinear data is obtained. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  Jian-hui Jiang,et al.  Principal Discriminant Variate Method for Classification of Multicollinear Data: Applications to Near-Infrared Spectra of Cow Blood Samples , 2002 .

[3]  Lars Nørgaard,et al.  Classification and prediction of quality and process parameters of thick juice and beet sugar by fluorescence spectroscopy and chemometrics , 1995 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[6]  Olav M. Kvalheim,et al.  Latent-structure decompositions (projections) of multivariate data , 1987 .

[7]  H. Kiiveri Canonical variate analysis of high-dimensional spectral data , 1992 .

[8]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[9]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[10]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .

[11]  D. A. Preece,et al.  Report of the Joint Editors , 1995 .

[12]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[13]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[14]  P. W. Yendle,et al.  Discriminant principal components analysis , 1989 .

[15]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[16]  Tormod Næs,et al.  A unified description of classical classification methods for multicollinear data , 1998 .

[17]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[18]  Wojtek J. Krzanowski,et al.  Orthogonal canonical variates for discrimination and classification , 1995 .

[19]  Wojtek J. Krzanowski,et al.  Ranking principal components to reflect group structure , 1992 .

[20]  Philip Jonathan,et al.  Discriminant analysis with singular covariance matrices. A method incorporating cross‐validation and efficient randomized permutation tests , 1996 .

[21]  T. Næs,et al.  Multivariate strategies for classification based on NIR-spectra—with application to mayonnaise , 1999 .

[22]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[23]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .