Genetic algorithm-based feature selection in high-resolution NMR spectra

High-resolution nuclear magnetic resonance (NMR) spectroscopy has provided a new means for detection and recognition of metabolic changes in biological systems in response to pathophysiological stimuli and to the intake of toxins or nutrition. To identify meaningful patterns from NMR spectra, various statistical pattern recognition methods have been applied to reduce their complexity and uncover implicit metabolic patterns. In this paper, we present a genetic algorithm (GA)-based feature selection method to determine major metabolite features to play a significant role in discrimination of samples among different conditions in high-resolution NMR spectra. In addition, an orthogonal signal filter was employed as a preprocessor of NMR spectra in order to remove any unwanted variation of the data that is unrelated to the discrimination of different conditions. The results of k-nearest neighbors and the partial least squares discriminant analysis of the experimental NMR spectra from human plasma showed the potential advantage of the features obtained from GA-based feature selection combined with an orthogonal signal filter.

[1]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[2]  R. Leardi,et al.  Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data , 2002 .

[3]  Desire L. Massart,et al.  Genetic algorithms (GA) applied to the orthogonal projection approach (OPA) for variable selection , 2004 .

[4]  Theodora Kourti,et al.  Application of latent variable methods to process control and multivariate statistical process control in industry , 2005 .

[5]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[6]  E. K. Kemsley A hybrid classification method: discrete canonical variate analysis using a genetic algorithm , 2001 .

[7]  David L. Woodruff,et al.  Beam search for peak alignment of NMR signals , 2004 .

[8]  Age K. Smilde,et al.  Direct orthogonal signal correction , 2001 .

[9]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[10]  Riccardo Leardi,et al.  Genetic algorithms in chemistry. , 2007, Journal of chromatography. A.

[11]  S. Wold,et al.  Multi‐way principal components‐and PLS‐analysis , 1987 .

[12]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[13]  Gerrit Kateman,et al.  CURVE-FITTING USING NATURAL COMPUTATION , 1994 .

[14]  John C. Lindon,et al.  Pattern recognition methods and applications in biomedical magnetic resonance , 2001 .

[15]  Olof Svensson,et al.  An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra , 1998 .

[16]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[17]  A. Höskuldsson PLS regression methods , 1988 .

[18]  M. Tanner,et al.  Metabonomic investigations in mice infected with Schistosoma mansoni: an approach for biomarker identification. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Tom Fearn,et al.  Orthogonal Signal Correction , 1999 .

[20]  I. Schuppe-Koistinen,et al.  Peak alignment of NMR signals by means of a genetic algorithm , 2003 .

[21]  Age K. Smilde,et al.  Analysis of longitudinal metabolomics data , 2004, Bioinform..

[22]  T. Ebbels,et al.  NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches , 2003 .

[23]  C. B. Lucasius,et al.  Conformational analysis of a dinucleotide photodimer with the aid of the genetic algorithm , 1992, Biopolymers.

[24]  E Holmes,et al.  Chemometric contributions to the evolution of metabonomics: mathematical solutions to characterising and interpreting complex biological NMR spectra. , 2002, The Analyst.

[25]  J. Hunger,et al.  Optimization and analysis of force field parameters by combination of genetic algorithms and neural networks , 1999, J. Comput. Chem..

[26]  J. M. González-Sáiz,et al.  Multivariate calibration of near infrared spectra by orthogonal WAVElet correction using a genetic algorithm , 2006 .

[27]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[28]  E Holmes,et al.  Metabonomic characterization of genetic variations in toxicological and metabolic responses using probabilistic neural networks. , 2001, Chemical research in toxicology.

[29]  J. Nicholson,et al.  Application of biofluid 1H nuclear magnetic resonance-based metabonomic techniques for the analysis of the biochemical effects of dietary isoflavones on human plasma profile. , 2003, Analytical biochemistry.