Gaussian binning: a new kernel-based method for processing NMR spectroscopic data for metabolomics

In many metabolomics studies, NMR spectra are divided into bins of fixed width. This spectral quantification technique, known as uniform binning, is used to reduce the number of variables for pattern recognition techniques and to mitigate effects from variations in peak positions; however, shifts in peaks near the boundaries can cause dramatic quantitative changes in adjacent bins due to non-overlapping boundaries. Here we describe a new Gaussian binning method that incorporates overlapping bins to minimize these effects. A Gaussian kernel weights the signal contribution relative to distance from bin center, and the overlap between bins is controlled by the kernel standard deviation. Sensitivity to peak shift was assessed for a series of test spectra where the offset frequency was incremented in 0.5 Hz steps. For a 4 Hz shift within a bin width of 24 Hz, the error for uniform binning increased by 150%, while the error for Gaussian binning increased by 50%. Further, using a urinary metabolomics data set (from a toxicity study) and principal component analysis (PCA), we showed that the information content in the quantified features was equivalent for Gaussian and uniform binning methods. The separation between groups in the PCA scores plot, measured by the J2 quality metric, is as good or better for Gaussian binning versus uniform binning. The Gaussian method is shown to be robust in regards to peak shift, while still retaining the information needed by classification and multivariate statistical techniques for NMR-metabolomics data.

[1]  J. LEE,et al.  Nuclear Magnetic Resonance , 1968, Nature.

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  M. Tanner,et al.  Metabonomic investigations in mice infected with Schistosoma mansoni: an approach for biomarker identification. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  P. Sexton,et al.  Molecular Pharmacology , 1965, Nature.

[5]  E Holmes,et al.  Development of a model for classification of toxin‐induced lesions using 1H NMR spectroscopy of urine combined with pattern recognition , 1998, NMR in biomedicine.

[6]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[7]  Elaine Holmes,et al.  NMR-based metabonomic studies on the biochemical effects of commonly used drug carrier vehicles in the rat. , 2002, Chemical research in toxicology.

[8]  Carissa M Krane,et al.  Empirical analysis of the STR profiles resulting from conceptual mixtures. , 2005, Journal of forensic sciences.

[9]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[10]  E Holmes,et al.  Automatic reduction of NMR spectroscopic data for statistical and pattern recognition classification of samples. , 1994, Journal of pharmaceutical and biomedical analysis.

[11]  Douglas W. Raiford,et al.  Amino acid cost and codon-usage biases in 6 prokaryotic genomes: a whole-genome analysis. , 2006, Molecular biology and evolution.

[12]  William F. Punch,et al.  Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Oscar Garcia,et al.  A proposed undergraduate bioinformatics curriculum for computer scientists , 2002, SIGCSE '02.

[14]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[15]  T. Kieber‐Emmons,et al.  1H-NMR metabonomics analysis of sera differentiates between mammary tumor-bearing mice and healthy controls , 2005, Metabolomics.

[16]  Oscar Garcia,et al.  Crossing the interdisciplinary barrier: a baccalaureate computer science option in bioinformatics , 2003, IEEE Trans. Educ..

[17]  M. Reily,et al.  Metabonomics: evaluation of nuclear magnetic resonance (NMR) and pattern recognition technology for rapid in vivo screening of liver and kidney toxicants. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[18]  J. Nicholson,et al.  NMR and pattern recognition studies on liver extracts and intact livers from rats treated with alpha-naphthylisothiocyanate. , 2002, Biochemical pharmacology.

[19]  John C. Lindon,et al.  NMR-based metabonomic studies on the biochemical effects of commonly used drug carrier vehicles in the rat. , 2002 .

[20]  C. Bountra,et al.  An NMR-based metabolic profiling study of inflammatory pain using the rat FCA model , 2007, Metabolomics.

[21]  W. Punch,et al.  Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. , 1997, Journal of molecular biology.

[22]  Ian D. Wilson,et al.  HIGH RESOLUTION PROTON MAGNETIC RESONANCE SPECTROSCOPY OF BIOLOGICAL FLUIDS , 1989 .

[23]  A. Cammarata,et al.  Pattern recognition. Classification of therapeutic agents according to pharmacophores. , 1976, Journal of medicinal chemistry.

[24]  E. R. Andrew,et al.  Nuclear Magnetic Resonance , 1955 .

[25]  Simon Ford,et al.  Systematic differences in electropherogram peak heights reported by different versions of the GeneScan software. , 2004, Journal of forensic sciences.

[26]  E Holmes,et al.  Nuclear magnetic resonance spectroscopy and pattern recognition analysis of the biochemical processes associated with the progression of and recovery from nephrotoxic lesions in the rat induced by mercury(II) chloride and 2-bromoethanamine. , 1992, Molecular pharmacology.

[27]  John C. Lindon,et al.  Pattern recognition methods and applications in biomedical magnetic resonance , 2001 .

[28]  J C Lindon,et al.  Pattern recognition analysis of high resolution 1H NMR spectra of urine. A nonlinear mapping approach to the classification of toxicological data , 1990, NMR in biomedicine.

[29]  E. Lock,et al.  1H-Nuclear magnetic resonance pattern recognition studies with N-phenylanthranilic acid in the rat: time- and dose-related metabolic effects , 2003, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[30]  Michael L. Raymer,et al.  GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation , 2004, GECCO.

[31]  J C Lindon,et al.  Pattern recognition classification of the site of nephrotoxicity based on metabolic data derived from proton nuclear magnetic resonance spectra of urine. , 1994, Molecular pharmacology.

[32]  Elaine Holmes,et al.  Metabonomic applications in toxicity screening and disease diagnosis. , 2002, Current topics in medicinal chemistry.

[33]  N. Reo NMR-BASED METABOLOMICS , 2002, Drug and chemical toxicology.

[34]  E Holmes,et al.  Nuclear magnetic resonance spectroscopic and principal components analysis investigations into biochemical effects of three model hepatotoxins. , 1998, Chemical research in toxicology.

[35]  I. Jolliffe Principal Component Analysis , 2002 .

[36]  J. Nicholson,et al.  Abnormal lipid profile of dystrophic cardiac tissue as demonstrated by one‐ and two‐dimensional magic‐angle spinning 1H NMR spectroscopy , 2001, Magnetic resonance in medicine.

[37]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[38]  M. Reily,et al.  In vivo toxicity screening programs using metabonomics. , 2002, Combinatorial chemistry & high throughput screening.

[39]  J C Lindon,et al.  Application of pattern recognition methods to the analysis and classification of toxicological data derived from proton nuclear magnetic resonance spectroscopy of urine. , 1991, Molecular pharmacology.