Characterization of 1H NMR spectroscopic data and the generation of synthetic validation sets

MOTIVATION Common contemporary practice within the nuclear magnetic resonance (NMR) metabolomics community is to evaluate and validate novel algorithms on empirical data or simplified simulated data. Empirical data captures the complex characteristics of experimental data, but the optimal or most correct analysis is unknown a priori; therefore, researchers are forced to rely on indirect performance metrics, which are of limited value. In order to achieve fair and complete analysis of competing techniques more exacting metrics are required. Thus, metabolomics researchers often evaluate their algorithms on simplified simulated data with a known answer. Unfortunately, the conclusions obtained on simulated data are only of value if the data sets are complex enough for results to generalize to true experimental data. Ideally, synthetic data should be indistinguishable from empirical data, yet retain a known best analysis. RESULTS We have developed a technique for creating realistic synthetic metabolomics validation sets based on NMR spectroscopic data. The validation sets are developed by characterizing the salient distributions in sets of empirical spectroscopic data. Using this technique, several validation sets are constructed with a variety of characteristics present in 'real' data. A case study is then presented to compare the relative accuracy of several alignment algorithms using the increased precision afforded by these synthetic data sets. AVAILABILITY These data sets are available for download at http://birg.cs.wright.edu/nmr_synthetic_data_sets.

[1]  J. Nicholson,et al.  Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics , 2002, Nature Medicine.

[2]  M. Akke,et al.  A statistical analysis of NMR spectrometer noise. , 2003, Journal of magnetic resonance.

[3]  Donald G Robertson,et al.  Metabonomics in toxicology: a review. , 2005, Toxicological sciences : an official journal of the Society of Toxicology.

[4]  Mark Harrison,et al.  Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform , 2007 .

[5]  J. Trygg,et al.  Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. , 2005, Analytical chemistry.

[6]  O. Fiehn,et al.  Identification and quantification of catecholamines in potato plants (Solanum tuberosum) by GC-MS. , 2001, Phytochemistry.

[7]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[8]  Steffen Neumann,et al.  Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements , 2008, BMC Bioinformatics.

[9]  R Somorjai,et al.  Detecting colorectal cancer by 1H magnetic resonance spectroscopy of fecal extracts , 2009, NMR in biomedicine.

[10]  Michael A. Stephens,et al.  Asymptotic Results for Goodness-of-Fit Statistics with Unknown Parameters , 1976 .

[11]  H. Williams,et al.  Metabolic profiling of genetic disorders: a multitissue (1)H nuclear magnetic resonance spectroscopic and pattern recognition study into dystrophic tissue. , 2001, Analytical biochemistry.

[12]  Elaine Holmes,et al.  NMR-based metabonomic studies on the biochemical effects of commonly used drug carrier vehicles in the rat. , 2002, Chemical research in toxicology.

[13]  K. Krishnamoorthy Handbook of statistical distributions with applications , 2006 .

[14]  I. Schuppe-Koistinen,et al.  Peak alignment of NMR signals by means of a genetic algorithm , 2003 .

[15]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[16]  Truman R. Brown,et al.  NMR Spectral Quantitation by Principal Component Analysis. , 2002 .

[17]  I. Wilson,et al.  An NMR‐based metabonomic approach to investigate the biochemical consequences of genetic strain differences: application to the C57BL10J and Alpk:ApfCD mouse , 2000, FEBS letters.

[18]  John C. Lindon,et al.  Pattern recognition methods and applications in biomedical magnetic resonance , 2001 .

[19]  Joel G Pounds,et al.  A study of spectral integration and normalization in NMR-based metabonomic analyses. , 2005, Journal of pharmaceutical and biomedical analysis.

[20]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[21]  E Holmes,et al.  Chemometric analysis of biofluids following toxicant induced hepatotoxicity: A metabonomic approach to distinguish the effects of 1-naphthylisothiocyanate from its products , 2005, Xenobiotica; the fate of foreign compounds in biological systems.

[22]  T. Ebbels,et al.  NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches , 2003 .

[23]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[24]  M. Stephens Tests of fit for the logistic distribution based on the empirical distribution function , 1979 .

[25]  John C Lindon,et al.  Earthworm species of the genus Eisenia can be phenotypically differentiated by metabolic profiling , 2002, FEBS letters.

[26]  Johan Lindberg,et al.  A comparison of methods for alignment of NMR peaks in the context of cluster analysis. , 2005, Journal of pharmaceutical and biomedical analysis.

[27]  E Holmes,et al.  Chemometric models for toxicity classification based on NMR spectra of biofluids. , 2000, Chemical research in toxicology.

[28]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[29]  Elaine Holmes,et al.  Metabonomic applications in toxicity screening and disease diagnosis. , 2002, Current topics in medicinal chemistry.

[30]  Ralf J. O. Torgrip,et al.  Peak alignment using reduced set mapping , 2003 .

[31]  Ian D Wilson,et al.  HPLC-MS-based methods for the study of metabonomics. , 2005, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[32]  Thomas J. Wang,et al.  The search for new cardiovascular biomarkers , 2008, Nature.

[33]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[34]  Thomas F. Coleman,et al.  On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds , 1994, Math. Program..

[35]  N. Reo NMR-BASED METABOLOMICS , 2002, Drug and chemical toxicology.

[36]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[37]  Christian Baumgartner,et al.  Metabolite profiling of blood from individuals undergoing planned myocardial infarction reveals early markers of myocardial injury. , 2008, The Journal of clinical investigation.

[38]  Michael A. Stephens,et al.  Goodness of fit for the extreme value distribution , 1977 .

[39]  Hugh M. Cartwright,et al.  SpecAlign - processing and alignment of mass spectra datasets , 2005, Bioinform..

[40]  H. Cartwright,et al.  Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. , 2005, Analytical chemistry.

[41]  T R Brown,et al.  NMR spectral quantitation by principal component analysis. III. A generalized procedure for determination of lineshape variations. , 2002, Journal of magnetic resonance.