An advanced clustering approach for assessing the repeatability and statistical relevance of 2D-COSY spectra

NMR techniques are widely used together with multivariate analysis approaches in order to characterize perturbations in metabolic pathways occurring during biological processes. A large amount of recent scientific and statistical works are available concerning 1D spectra (principally ¹H-NMR spectra). More recently, two-dimensional NMR spectroscopy techniques have been investigated: homonuclear (COSY,…) and heteronuclear ones (HSQC,…). It is commonly accepted by users (biologists, pharmacologists) that the recent introduction of 2D-NMR methods represents a huge qualitative gap for metabolomics investigations in terms of metabolites and biomarkers identifications. Indeed, it seems obvious that additional dimension means more predictive power. But, until now, no statistical study clearly proved this assumption. Therefore, a fundamental question is “Is supplementary information equivalent to relevant and crucial information ?”. In order to extend the statistical properties and tools developed for 1D spectroscopy to the new challenges raised by 2D spectra, a rigorous study of the repeatability of 2D-NMR spectra is needed as a prerequisite. In the context of first homonuclear COSY experiments, we will present a methodology based on accurate multivariate clustering tools. Numerical quality indexes and graphical clustering results will be shown, obtained via binary vectors of positions, via recoded intensity vectors and through different levels of spectral resolution. A second objective is to compare these 2D results with corresponding 1D results (¹H-NMR) obtained in the same conditions. This methodology was applied to two real datasets (peak lists), corresponding to two different experimental designs: first, a 4-mixture cell culture system containing various supervised metabolites, and second, a human serum based design with time repetitions and multiple permutations. Our preliminary results seem already promising: COSY appears to be a statistically robust tool and, furthermore, additional information appears to be relevant.