Swarm intelligence based wavelet coefficient feature selection for mass spectral classification: an application to proteomics data.

This paper introduces the ant colony algorithm, a novel swarm intelligence based optimization method, to select appropriate wavelet coefficients from mass spectral data as a new feature selection method for ovarian cancer diagnostics. By determining the proper parameters for the ant colony algorithm (ACA) based searching algorithm, we perform the feature searching process for 100 times with the number of selected features fixed at 5. The results of this study show: (1) the classification accuracy based on the five selected wavelet coefficients can reach up to 100% for all the training, validating and independent testing sets; (2) the eight most popular selected wavelet coefficients of the 100 runs can provide 100% accuracy for the training set, 100% accuracy for the validating set, and 98.8% accuracy for the independent testing set, which suggests the robustness and accuracy of the proposed feature selection method; and (3) the mass spectral data corresponding to the eight popular wavelet coefficients can be located by reverse wavelet transformation and these located mass spectral data still maintain high classification accuracies (100% for the training set, 97.6% for the validating set, and 98.8% for the testing set) and also provide sufficient physical and medical meaning for future ovarian cancer mechanism studies. Furthermore, the corresponding mass spectral data (potential biomarkers) are in good agreement with other studies which have used the same sample set. Together these results suggest this feature extraction strategy will benefit the development of intelligent and real-time spectroscopy instrumentation based diagnosis and monitoring systems.

[1]  Cristina E. Davis,et al.  Autoregressive model based feature extraction method for time shifted chromatography data , 2009 .

[2]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[3]  Zhou Wang,et al.  Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain , 2008 .

[4]  Gonzalo Pajares,et al.  A wavelet-based image fusion tutorial , 2004, Pattern Recognit..

[5]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[6]  A. Felinger,et al.  Wavelet analysis of the baseline noise in HPLC , 2004 .

[7]  S. Wold,et al.  PLS regression on wavelet compressed NIR spectra , 1998 .

[8]  In-Keun Yu,et al.  Prediction of system marginal price of electricity using wavelet transform analysis , 2002 .

[9]  D. Massart,et al.  The Radial Basis Functions — Partial Least Squares approach as a flexible non-linear regression technique , 1996 .

[10]  P. Hammer,et al.  Ovarian cancer detection by logical analysis of proteomic data , 2004, Proteomics.

[11]  M. Vannucci,et al.  A novel wavelet‐based thresholding method for the pre‐processing of mass spectrometry data that accounts for heterogeneous noise , 2008, Proteomics.

[12]  Frank Y. Shih,et al.  Wavelet coefficients clustering using morphological operations and pruned quadtrees , 2000, Pattern Recognit..

[13]  Paul S. Addison,et al.  The Illustrated Wavelet Transform Handbook Introductory Theory And Applications In Science , 2002 .

[14]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[15]  Philip K. Hopke,et al.  Predicting bulk ambient aerosol compositions from ATOFMS data with ART-2a and multivariate analysis , 2005 .

[16]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Cristina E. Davis,et al.  Machine Learning: A Crucial Tool for Sensor Design , 2008, Algorithms.

[21]  A. Smilde,et al.  How to distinguish healthy from diseased? Classification strategy for mass spectrometry‐based clinical proteomics , 2007, Proteomics.

[22]  Eduard Llobet,et al.  Efficient feature selection for mass spectrometry based electronic nose applications , 2007 .

[23]  Daniel Cozzolino,et al.  Use of direct headspace-mass spectrometry coupled with chemometrics to predict aroma properties in Australian Riesling wine. , 2008, Analytica chimica acta.

[24]  J. Brezmes,et al.  Variable selection for support vector machine based multisensor systems , 2007 .

[25]  Cristina E Davis,et al.  Two-dimensional wavelet analysis based classification of gas chromatogram differential mobility spectrometry signals. , 2009, Analytica chimica acta.

[26]  Marina Vannucci,et al.  Identifying Biomarkers from Mass Spectrometry Data with Ordinal Outcome , 2007, Cancer informatics.

[27]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[28]  D. Dey,et al.  Statistical approach to metabonomic analysis of rat urine following surgical trauma , 2006 .

[29]  R. Hargreaves,et al.  Clinical biomarkers in drug discovery and development , 2003, Nature Reviews Drug Discovery.

[30]  Desire L. Massart,et al.  Optimization of signal denoising in discrete wavelet transform , 1999 .

[31]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[32]  Díbio Leandro Borges,et al.  Analysis of mammogram classification using a wavelet transform decomposition , 2003, Pattern Recognit. Lett..

[33]  Bart Nicolai,et al.  Kernel PLS regression on wavelet transformed NIR spectra for prediction of sugar content of apple , 2007 .

[34]  Thomas Villmann,et al.  Cancer informatics by prototype networks in mass spectrometry , 2009, Artif. Intell. Medicine.