Pacific Symposium on Biocomputing 12:115-126(2007) LEVERAGING LATENT INFORMATION IN NMR SPECTRA FOR ROBUST PREDICTIVE MODELS

A significant challenge in metabolomics experiments is extracting biologically meaningful data from complex spectral information. I this paper we compare two techniques for representing 1D NMR spectra: “Spectra l Binning” and “Targeted Profiling”. We use simulated 1D NMR spectra with spe cific characteristics to assess the quality of predictive multivariate statistical mode ls built using both data representations. We also assess the effect of different variable sca ling techniques on the two data representations. We demonstrate that models built u sing Targeted Profiling are not only more interpretable than Spectral Binning models, bu t are more robust with respect to compound overlap, and variability in solution condi tions (such as pH and ionic strength). Our findings from the synthetic dataset were valida te using a real-world dataset.

[1]  Ronald Eugene Shaffer,et al.  Multi‐ and Megavariate Data Analysis. Principles and Applications, I. Eriksson, E. Johansson, N. Kettaneh‐Wold and S. Wold, Umetrics Academy, Umeå, 2001, ISBN 91‐973730‐1‐X, 533pp. , 2002 .

[2]  D. Seligson,et al.  Clinical Chemistry , 1965, Bulletin de la Societe de chimie biologique.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[5]  R. Semelka,et al.  Concepts of magnetic resonance , 2005 .