Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain

Successful identification of the important metabolite features in high-resolution nuclear magnetic resonance (NMR) spectra is a crucial task for the discovery of biomarkers that have the potential for early diagnosis of disease and subsequent monitoring of its progression. Although a number of traditional features extraction/selection methods are available, most of them have been conducted in the original frequency domain and disregarded the fact that an NMR spectrum comprises a number of local bumps and peaks with different scales. In the present study a complex wavelet transform that can handle multiscale information efficiently and has an energy shift-insensitive property is proposed as a method to improve feature extraction and classification in NMR spectra. Furthermore, a multiple testing procedure based on a false discovery rate (FDR) was used to identify important metabolite features in the complex wavelet domain. Experimental results with real NMR spectra showed that classification models constructed with the complex wavelet coefficients selected by the FDR-based procedure yield lower rates of misclassification than models constructed with original features and conventional wavelet coefficients.

[1]  Julie Wilson,et al.  Novel feature selection method for genetic programming using metabolomic 1H NMR data , 2006 .

[2]  David L. Woodruff,et al.  Beam search for peak alignment of NMR signals , 2004 .

[3]  E. K. Kemsley,et al.  FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. , 2003, Journal of agricultural and food chemistry.

[4]  R. Goodacre,et al.  Chemometric discrimination of unfractionated plant extracts analyzed by electrospray mass spectrometry. , 2003, Phytochemistry.

[5]  Jean-Marie Dereppe,et al.  The continuous wavelet transform, an analysis tool for NMR spectroscopy , 1997 .

[6]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Seoung Bum Kim,et al.  Multiple testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in beta-sheets , 2006, Int. J. Bioinform. Res. Appl..

[10]  Timothy M. D. Ebbels,et al.  Batch statistical processing of 1H NMR‐derived urinary spectral data , 2002 .

[11]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[12]  P. Schellhammer,et al.  Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data , 2003, Biometrics.

[13]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[14]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[15]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[18]  S. Mallat A wavelet tour of signal processing , 1998 .

[19]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[20]  Mike E. Davies,et al.  IEEE International Conference on Acoustics Speech and Signal Processing , 2008 .

[21]  D. Pollen,et al.  Phase relationships between adjacent simple cells in the visual cortex. , 1981, Science.

[22]  Heinz Rüterjans,et al.  WAVEWAT-improved solvent suppression in NMR spectra employing wavelet transforms. , 2002, Journal of magnetic resonance.