Pattern Recognition Studies of Complex Chromatographic Data Sets.

Abstract Chromatographic fingerprinting of complex biological and environmental samples is a active research area with a large and growing literature. Multivariate statistical and pattern recognition techniques can be effective methods for the analysis of such complex data. However, the classification of complex samples on the basis of their chromatographic profiles is complicated by two factors: (1) confounding of the desired group information by experimental variables or other systematic variations, and (2) random or chance classification effects with linear discriminants. Several interesting projects involving these effects and methods for dealing with the effects are discussed. Complex chromatographic data sets often contain information dependent on experimental variables as well as information which differentiates classes. The existence of these types of complicating relationships is an innate part of fingerprint-type data. ADAPT, an interactive computer software system, has the clustering, mapping, and statistical tools necessary to identify and study these effects in realistically large data sets. In one study, pattern recognition analysis of 144 pyrochromatograms from cultured skin fibroblasts was used to differentiate cystic fibrosis carriers from presumed normal donors. Several experimental variables (door gender, chromatographic column, etc.) were observed to contribute to the overall classification process. Notwithstanding these effects, discriminants were developed from the chromatographic peaks that assigned a given pyrochromatogram to its respective class (cystic fibrosis carrier versus normal) largely on the basis of the desired pathological difference. In another study gas chromatographic profiles of cuticular hydrocarbon extracts obtained from 170 red fire at samples were analyzed using pattern recognition methods. Clustering according to the biological variables of social caste and colony was observed. Previously, Monte-Carlo simulation studies have been carried out to assess the probability of chance classification for nonparametric linear discriminants. The level of expected chance classification as a function of the number of observations, the dimensionality, class membership distribution, and covariance structure of the data were examined. These simulation studies established limits on the approaches that can be taken with real data sets so that chance classifications are improbable.

[1]  Olav M. Kvalheim,et al.  SIMCA multivariate data analysis of blue mussel components in environmental pollution studies , 1983 .

[2]  E. Jellum,et al.  Profiling of human body fluids in healthy and diseased states using gas chromatography and mass spectrometry, with special reference to organic acids. , 1977, Journal of chromatography.

[3]  Peter C. Jurs,et al.  Pattern recognition studies of complex chromatographic data sets , 1986 .

[4]  Terry R. Stouch,et al.  Monte Carlo studies of the classifications made by nonparametric linear discriminant functions , 1985, J. Chem. Inf. Comput. Sci..

[5]  A. Zlatkis,et al.  The role of organic volatile profiles in clinical diagnosis. , 1981, Clinical chemistry.

[6]  E Reiner,et al.  Botulism: a pyrolysis-gas-liquid chromatographic study. , 1978, Journal of chromatographic science.

[7]  A. J. Stuper,et al.  Computer assisted studies of chemical structure and biological function , 1979 .

[8]  A. J. Stuper,et al.  Nonparametric feature selection in pattern recognition applied to chemical problems , 1975 .

[9]  E. Reiner,et al.  Differentiation of normal and pathological cells by pyrolysis-GLC , 1972 .

[10]  J. L. Fasching,et al.  Chemometrics and liquid chromatography in the study of acute lymphocytic leukemia , 1983 .

[11]  S Wold,et al.  Classification of human cancer cells by means of capillary gas chromatography and pattern recognition analysis. , 1981, Journal of chromatography.

[12]  S. Wold,et al.  Application of simca multivariate data analysis to the classification of gas chromatographic profiles of human brain tissues , 1981 .

[13]  D. Wojcik,et al.  Chemical Mimicry in the Myrmecophilous Beetle Myrmecaphodius excavaticollis , 1982, Science.

[14]  L Kryger,et al.  Interpretation of analytical chemical information by pattern recognition methods-a survey. , 1981, Talanta.

[15]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[16]  Thomas L. Isenhour,et al.  Chemical applications of pattern recognition , 1975 .

[17]  I. Moriguchi,et al.  Adaptive least-squares method applied to structure--activity correlation of hypotensive N-alkyl-N''-cyano-N'-pyridylguanidines. , 1980, Journal of medicinal chemistry.

[18]  P. Jurs,et al.  The Probability of Dichotomization by a Binary Linear Classifier as a Function of Training Set Population Distribution , 1979 .

[19]  Peter C. Jurs,et al.  Reliability of Nonparametric Linear Classifers , 1976, J. Chem. Inf. Comput. Sci..

[20]  M. L. McConnell,et al.  Application of pattern recognition and feature extraction techniques to volatile constituent metabolic profiles obtained by capillary gas chromatography. , 1979, Journal of chromatography.

[21]  B K Lavine,et al.  Application of pyrolysis/gas chromatography/pattern recognition to the detection of cystic fibrosis heterozygotes. , 1985, Analytical chemistry.

[22]  Kurt Varmuza,et al.  Pattern recognition in chemistry , 1980 .

[23]  M. L. McConnell,et al.  Metabolic abnormalities associated with diabetes mellitus, as investigated by gas chromatography and pattern-recognition analysis of profiles of volatile metabolites. , 1981, Clinical chemistry.