Mass spectrometry cancer data classification using wavelets and genetic algorithm

This paper introduces a hybrid feature extraction method applied to mass spectrometry (MS) data for cancer classification. Haar wavelets are employed to transform MS data into orthogonal wavelet coefficients. The most prominent discriminant wavelets are then selected by genetic algorithm (GA) to form feature sets. The combination of wavelets and GA yields highly distinct feature sets that serve as inputs to classification algorithms. Experimental results show the robustness and significant dominance of the wavelet‐GA against competitive methods. The proposed method therefore can be applied to cancer classification models that are useful as real clinical decision support systems for medical practitioners.

[1]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[2]  Guangtao Ge,et al.  Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles , 2008, BMC Bioinformatics.

[3]  Gail A. Carpenter,et al.  Default ARTMAP 2 , 2007, 2007 International Joint Conference on Neural Networks.

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  Concha Bielza,et al.  Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data , 2013, Inf. Sci..

[6]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[7]  Ao Kong,et al.  Biomarker Signature Discovery from Mass Spectrometry Data , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[9]  Matthias Mann,et al.  Bioinformatics analysis of mass spectrometry‐based proteomics data sets , 2009, FEBS letters.

[10]  Hui-Huang Hsu,et al.  Hybrid feature selection by combining filters and wrappers , 2011, Expert Syst. Appl..

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[12]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[13]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[14]  Hesham H. Ali,et al.  Link test - A statistical method for finding prostate cancer biomarkers , 2006, Comput. Biol. Chem..

[15]  Zhihong He,et al.  Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures , 2010, Comput. Biol. Chem..

[16]  James R. Schott,et al.  Principles of Multivariate Analysis: A User's Perspective , 2002 .

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[19]  John R. Yates,et al.  The biological impact of mass-spectrometry-based proteomics , 2007, Nature.

[20]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[21]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  G. Carpenter Default ARTMAP , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24]  Jian Pei,et al.  A rank sum test method for informative gene discovery , 2004, KDD.

[25]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[26]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[27]  P. Brown,et al.  NIR and mass spectra classification: Bayesian methods for wavelet-based feature selection , 2005 .

[28]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[29]  Chulhee Lee,et al.  Feature extraction based on the Bhattacharyya distance , 2003, Pattern Recognit..

[30]  Martin Guha,et al.  Encyclopedia of Statistics in Behavioral Science , 2006 .