Wavelet selection for disease classification by DNA microarray data

The microarrays report the measures of the expression levels of tens of thousands of genes, this high dimensional feature vector contains also irrelevant information for accurate classification. Moreover, only few training samples are available, hence for avoiding the curse of dimensionality problem a feature reduction should be performed before the classification step. Here, we proposed a set of orthogonal wavelet detail coefficients of different wavelet mothers to extract the features from the microarray data. We propose to use a multi-classifiers where each classifier, a support vector machine, is trained using a different set of detail coefficients, the classifiers are combined by ''sum rule''. The detail coefficients set selection is performed by running Sequential Forward Floating Selection (SFFS). The goodness of the proposed method is validated using the area under the ROC curve as performance indicator, the experiments are carried out on four-datasets: Breast dataset; Ovarian dataset; Lung dataset; Prostate dataset. The results show that the proposed method outperforms the performance that can be obtained by a single set of detail coefficients. Moreover, we have shown that, also using as features the detail coefficients, a random subspace of classifiers outperforms the stand-alone classifiers.

[1]  Li Shang,et al.  Feature selection in independent component subspace for microarray data classification , 2006, Neurocomputing.

[2]  Loris Nanni,et al.  Introduction to Neonatal Facial Pain Detection Using Common and Advanced Face Classification Techniques , 2007, Advanced Computational Intelligence Paradigms in Healthcare.

[3]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[4]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[5]  Lei Huang,et al.  A SUPPORT VECTOR MACHINE APPROACH FOR PREDICTION OF T CELL EPITOPES , 2005 .

[6]  Hannah Peters,et al.  Machine Learning Research Progress , 2010 .

[7]  E. Kristiansson Statistical analysis of gene expression data , 2007 .

[8]  Yanqing Zhang,et al.  Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis , 2007, ISBRA.

[9]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[10]  A. Madabhushi,et al.  Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Loris Nanni,et al.  Orthogonal linear discriminant analysis and feature selection for micro-array data classification , 2010, Expert Syst. Appl..

[12]  Giorgio Valentini,et al.  Bio-molecular cancer prediction with random subspace ensembles of support vector machines , 2005, Neurocomputing.

[13]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[14]  Dong Hua,et al.  An ensemble approach to microarray data-based gene prioritization after missing value imputation , 2007, Bioinform..

[15]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Loris Nanni,et al.  Wavelet decomposition tree selection for palm and face authentication , 2008, Pattern Recognit. Lett..

[19]  Sandrine Dudoit,et al.  Classification in microarray experiments , 2003 .

[20]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Yihui Liu,et al.  Wavelet feature extraction for high-dimensional microarray data , 2009, Neurocomputing.

[22]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[23]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[24]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[25]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  James J. Chen,et al.  Ensemble methods for classification of patients for personalized medicine with high-dimensional data , 2007, Artif. Intell. Medicine.

[28]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[29]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[30]  Loris Nanni,et al.  Random subspace for an improved BioHashing for face authentication , 2008, Pattern Recognit. Lett..

[31]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[32]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[33]  Wei Pan,et al.  A comparative study of discriminating human heart failure etiology using gene expression profiles , 2005, BMC Bioinformatics.

[34]  Loris Nanni,et al.  Ensemblator: An ensemble of classifiers for reliable classification of biological data , 2007, Pattern Recognit. Lett..

[35]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[36]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[37]  Gideon Rechavi,et al.  Microarray-based gene expression profiling of hematologic malignancies: basic concepts and clinical applications. , 2005, Blood reviews.

[38]  D. Stone,et al.  Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[40]  J M Olson,et al.  Linkage analysis of human systemic lupus erythematosus-related traits: a principal component approach. , 2001, Arthritis and rheumatism.

[41]  Leif E. Peterson,et al.  Logistic Ensembles for Random Spherical Linear Oracles , 2007, ICMLA 2007.

[42]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[43]  Lakhmi C. Jain,et al.  Advanced Computational Intelligence Paradigms in Healthcare - 2 , 2007, Advanced Computational Intelligence Paradigms in Healthcare - 2.