Application of dissimilarity indices, principal coordinates analysis, and rank tests to peak tables in metabolomics of the gas chromatography/mass spectrometry of human sweat.

The majority of works in metabolomics employ approaches based on principal components analysis (PCA) and partial least-squares, primarily to determine whether samples fall within large groups. However, analytical chemists rarely tackle the problem of individual fingerprinting, and in order to do this effectively, it is necessary to study a large number of small groups rather than a small number of large groups and different approaches are required, as described in this paper. Furthermore, many metabolomic studies on mammals and humans involve analyzing compounds (or peaks) that are present in only a certain portion of samples, and conventional approaches of PCA do not cope well with sparse matrices where there may be many 0s. There is, however, a large number of qualitative similarity measures available for this purpose that can be exploited via principal coordinates analysis (PCO). It can be shown that PCA scores are a specific case of PCO scores, using a quantitative similarity measure. A large-scale study of human sweat consisting of nearly 1000 gas chromatography/mass spectrometry analyses from the sweat of an isolated population of 200 individuals in Carinthia (Southern Austria) sampled once per fortnight over 10 weeks was employed in this study and grouped into families. The first step was to produce a peak table requiring peak detection, alignment, and integration. Peaks were reduced from 5080 to 373 that occurred in at least 1 individual over 4 out of 5 fortnights. Both qualitative (presence/absence) and quantitative (equivalent to PCA) similarity measures can be computed. PCO and the Kolomorogov-Smirnoff (KS) rank test are applied to these similarity matrices. It is shown that for this data set there is a reproducible individual fingerprint, which is best represented using the qualitative similarity measure as assessed both by the Hotelling t2 statistic as applied to PCO scores and the probabilities associated with the KS rank test.