Bag of Peaks: interpretation of NMR spectrometry

MOTIVATION The analysis of high-resolution proton nuclear magnetic resonance (NMR) spectrometry can assist human experts to implicate metabolites expressed by diseased biofluids. Here, we explore an intermediate representation, between spectral trace and classifier, able to furnish a communicative interface between expert and machine. This representation permits equivalent, or better, classification accuracies than either principal component analysis (PCA) or multi-dimensional scaling (MDS). In the training phase, the peaks in each trace are detected and clustered in order to compile a common dictionary, which could be visualized and adjusted by an expert. The dictionary is used to characterize each trace with a fixed-length feature vector, termed Bag of Peaks, ready to be classified with classical supervised methods. RESULTS Our small-scale study, concerning Type I diabetes in Sardinian children, provides a preliminary indication of the effectiveness of the Bag of Peaks approach over standard PCA and MDS. Consistently, higher classification accuracies are obtained once a sufficient number of peaks (>10) are included in the dictionary. A large-scale simulation of noisy spectra further confirms this advantage. Finally, suggestions for metabolite-peak loci that may be implicated in the disease are obtained by applying standard feature selection techniques.

[1]  Anthony Ralston,et al.  Statistical Methods for Digital Computers. , 1980 .

[2]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  T R Brown,et al.  NMR spectral quantitation by principal component analysis , 2001, NMR in biomedicine.

[6]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[9]  R. Carter,et al.  Effects of single doses of N-Allylnormorphine on hindlimb reflexes of chronic spinal dogs during cycles of morphine addiction. , 1953, The Journal of pharmacology and experimental therapeutics.

[10]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Robert P.W. Duin,et al.  PRTools3: A Matlab Toolbox for Pattern Recognition , 2000 .

[14]  W. Hutton,et al.  Exponential Parameter Estimation (in NMR) Using Bayesian Probability Theory , 2005 .

[15]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[16]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[17]  G. Bodenhausen,et al.  Principles of nuclear magnetic resonance in one and two dimensions , 1987 .

[18]  Henrik Antti,et al.  Contemporary issues in toxicology the role of metabonomics in toxicology and its evaluation by the COMET project. , 2003, Toxicology and applied pharmacology.

[19]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[20]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[21]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[22]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[24]  John C. Lindon,et al.  The handbook of metabonomics and metabolomics , 2007 .

[25]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Royston Goodacre,et al.  Metabolic profiling: pathways in discovery. , 2004, Drug discovery today.

[28]  Christian Schorn,et al.  NMR-Spectroscopy: Data Acquisition , 2001 .

[29]  John C. Lindon,et al.  Pattern recognition methods and applications in biomedical magnetic resonance , 2001 .

[30]  I. Jolliffe Principal Component Analysis , 2002 .

[31]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[32]  Thomas M. Barbara Principles of Nuclear Magnetic Resonance in One and Two Dimensions.Richard R. Ernst , Geoffrey Bodenhausen , Alexander Wokaun , 1988 .

[33]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[34]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[35]  H. Keun,et al.  Metabonomic modeling of drug toxicity. , 2006, Pharmacology & therapeutics.