An efficient spectra processing method for metabolite identification from 1H-NMR metabolomics data

AbstractThe spectra processing step is crucial in metabolomics approaches, especially for proton NMR metabolomics profiling. During this step, noise reduction, baseline correction, peak alignment and reduction of the 1D 1H-NMR spectral data are required in order to allow biological information to be highlighted through further statistical analyses. Above all, data reduction (binning or bucketing) strongly impacts subsequent statistical data analysis and potential biomarker discovery. Here, we propose an efficient spectra processing method which also provides helpful support for compound identification using a new data reduction algorithm that produces relevant variables, called buckets. These buckets are the result of the extraction of all relevant peaks contained in the complex mixture spectra, rid of any non-significant signal. Taking advantage of the concentration variability of each compound in a series of samples and based on significant correlations that link these buckets together into clusters, the method further proposes automatic assignment of metabolites by matching these clusters with the spectra of reference compounds from the Human Metabolome Database or a home-made database. This new method is applied to a set of simulated 1H-NMR spectra to determine the effect of some processing parameters and, as a proof of concept, to a tomato 1H-NMR dataset to test its ability to recover the fruit extract compositions. The implementation code for both clustering and matching steps is available upon request to the corresponding author. FigureIllustration of the processing approach from spectra bucketing to the proposal of candidate compounds, using a set of six simulated NMR spectra. First, the ERVA method of data reduction is applied to the spectra after noise processing, generating buckets as shown for two spectra regions. Second, the correlation matrix between bucket intensities is computed and a correlation threshold is applied for bucket clustering. The cluster shown gathers two sub-clusters (A and B), each being intra-connected with higher correlations (r > 0.996) than the interconnections (r < 0.994). Third, matching of the cluster with using a reference compound library provides a list of candidate compounds. Last, for validation, the reference spectrum of proline is shown with the corresponding matched regions highlighted.

[1]  Dan C. Tulpan,et al.  MetaboHunter: an automatic approach for identification of metabolites from 1H-NMR spectra of complex mixtures , 2011, BMC Bioinformatics.

[2]  Maria De Iorio,et al.  BATMAN - an R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model , 2012, Bioinform..

[3]  Michael L. Raymer,et al.  Dynamic adaptive binning: an improved quantification technique for NMR spectroscopic data , 2011, Metabolomics.

[4]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[5]  John C. Lindon,et al.  Analytical technologies for metabonomics and metabolomics, and multi-omic information recovery , 2008 .

[6]  William J. Astle,et al.  A Bayesian Model of NMR Spectra for the Deconvolution and Quantification of Metabolites in Complex Biological Mixtures , 2011, 1105.2204.

[7]  Maria De Iorio,et al.  BATMAN-an R package for the automated quantification of metabolites from NMR spectra using a Bayesian Model , 2011 .

[8]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[9]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[10]  Fred W. McLafferty,et al.  Probability based matching of mass spectra. Rapid identification of specific compounds in mixtures , 1974 .

[11]  P J Sadler,et al.  High resolution 1H n.m.r. studies of vertebrate blood and plasma. , 1983, The Biochemical journal.

[12]  J. Markley,et al.  rNMR: open source software for identifying and quantifying metabolites in NMR spectra , 2009, Magnetic resonance in chemistry : MRC.

[13]  Golotvin,et al.  Improved baseline recognition and modeling of FT NMR spectra , 2000, Journal of magnetic resonance.

[14]  Mark Harrison,et al.  Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform , 2007 .

[15]  I. Campbell,et al.  Human erythrocyte metabolism studies by 1H spin echo NMR , 1977, FEBS letters.

[16]  David S. Wishart,et al.  Quantitative metabolomics using NMR , 2008 .

[17]  Elena Tsiporkova,et al.  NMR-based characterization of metabolic alterations in hypertension using an adaptive, intelligent binning algorithm. , 2008, Analytical chemistry.

[18]  Robert D. Hall,et al.  Biology of plant metabolomics , 2011 .

[19]  J. Schripsema,et al.  Application of NMR in plant metabolomics: techniques, problems and prospects. , 2010, Phytochemical analysis : PCA.

[20]  Mark R. Viant,et al.  Environmental metabolomics: a critical review and future perspectives , 2009, Metabolomics.

[21]  Macha Nikolski,et al.  MeRy-B: a web knowledgebase for the storage, visualization, analysis and annotation of plant NMR metabolomic profiles , 2011, BMC Plant Biology.

[22]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[23]  Christoph Steinbeck,et al.  MetaboLights: towards a new COSMOS of metabolomics data management , 2012, Metabolomics.

[24]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[25]  S. Wijmenga,et al.  NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. , 2012, Analytica chimica acta.

[26]  Cécile Cabasson,et al.  Quantitative metabolic profiles of tomato flesh and seeds during fruit development: complementary analysis with ANN and PCA , 2007, Metabolomics.

[27]  Guido F Pauli,et al.  Quantitative 1H NMR. Development and potential of an analytical method: an update. , 2012, Journal of natural products.

[28]  G. Webb NMR Spectroscopy , 1972, Nature.

[29]  Evelyne Vigneau,et al.  Clustering of variables to analyze spectral data , 2005 .

[30]  R. Hall Annual Plant Reviews Volume 43: Biology of Plant Metabolomics , 2011 .

[31]  David S Wishart,et al.  Towards automatic metabolomic profiling of high-resolution one-dimensional proton NMR spectra , 2011, Journal of Biomolecular NMR.