Correspondence, principal coordinate, and redundancy analysis used on mixed chemotaxonomical qualitative and quantitative data

Abstract A mixed type data matrix consisting of 11 quantitative carbohydrate variables and 23 binary secondary metabolites data measured in 5–8 isolates of 7 species of Penicillium was analyzed using different multivariate statistical methods. This kind of data matrix is common in numerical taxonomy and has formerly been analyzed by consensus methods based on the separate analysis of the quantitative and qualitative data matrix, by using Gower's general similarity coefficient for mixed data or by location models. For the initial data treatment the χ 2 , Bray—Curtis and Canberra distance coefficients were useful for cluster analysis and minimum spanning trees (MSTs) combined with principal coordinate analysis (PCO). The multivariate ordination methods hitherto recommended for chemotaxonomic data, principal component analysis (PCA) and its constrained ordination equivalent partial least squares (PLS) analysis (using dummy variables for each species) gave seven quite diffuse clusters with some overlap in two-dimensional ordination plots, while correspondence analysis (CA) gave seven very clear clusters. The results indicate that qualitative data strongly dominate quantitative data and that these qualitative data are best represented in plots by correspondence analysis. However, in physiological studies the quantitative data may be considered the most important, PCA and CA are preferred for the analysis of mixed data. Dummy constrained PLS may be used to select quantitative variables that are species specific rather than related to climatic conditions. In classification studies at the species level it is recommended to use correspondence analysis on mixed chemotaxonomical data. In the latter studies variables based on differentiation, such as the biosynthetic families of secondary metabolites used here, give clear species separations, and can be used for further cladistic analyses.

[1]  Henk A. L. Kiers,et al.  Principal covariates regression: Part I. Theory , 1992 .

[2]  N. B. Vogt Soft modelling and chemosystematics , 1987 .

[3]  John C. Gower Numerical techniques as an aid to objectivity. , 1988 .

[4]  T. Kurczynski,et al.  Generalized Distance and Discrete Variables , 1970 .

[5]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[6]  J. Frisvad,et al.  TERVERTICILLATE PENICILLIA: CHEMOTAXONOMY AND MYCOTOXIN PRODUCTION , 1989 .

[7]  C.J.F. ter Braak,et al.  The analysis of vegetation-environment relationships by canonical correspondence analysis , 1987 .

[8]  J. Frisvad Chemometrics and chemotaxonomy: A comparison of multivariate statistical methods for the evaluation of binary fungal secondary metabolite data , 1992 .

[9]  J. Frisvad The use of high‐performance liquid chromatography and diode array detection in fungal chemotaxonomy based on profiles of secondary metabolites , 1989 .

[10]  R. Sokal STATISTICAL METHODS IN SYSTEMATICS* , 1965, Biological reviews of the Cambridge Philosophical Society.

[11]  J. Frisvad The connection between the Penicillia and Aspergilli and mycotoxins with special emphasis on misidentified isolates , 1989, Archives of environmental contamination and toxicology.

[12]  E. Cook,et al.  A Quantitative Taxonomic Study of the Hoplopleura hesperomydis Complex (Anoplura, Hoplopleuridae), with Notes on A. Posteriori Taxonomic Characters , 1966 .

[13]  C.J.F. ter Braak,et al.  A Theory of Gradient Analysis , 2004 .

[14]  J. Frisvad,et al.  Standardized high-performance liquid chromatography of 182 mycotoxins and other fungal metabolites based on alkylphenone retention indices and UV—VIS spectra (diodearray detection) , 1987 .

[15]  Wojtek J. Krzanowski,et al.  The location model for mixtures of categorical and continuous variables , 1993 .

[16]  W. Heiser Joint Ordination of Species and Sites: The Unfolding Technique , 1987 .

[17]  E. S. Gilbert On Discrimination Using Qualitative Variables , 1968 .

[18]  D F Klein,et al.  A comparison of successive screening and discriminant function techniques in medical taxonomy. , 1969, Biometrics.

[19]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[20]  Hilmer S⊘rensen,et al.  New Principles of Ion-Exchange Techniques Suitable to Sample Preparation and Group Separation of Natural Products Prior to Liquid Chromatography , 1984 .

[21]  C. Braak Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis , 1986 .

[22]  J. Frisvad Modifications on media based on creatine for use in Penicillium and Aspergillus taxonomy , 1993 .