A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data

Feature (gene) selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA) and fuzzy backward feature elimination (FBFE) are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM) and Naïve Bayes (NB) classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA) as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  Lekha Bhambhu,et al.  DATA CLASSIFICATION USING SUPPORT VECTOR MACHINE , 2009 .

[3]  Chun-Chin Hsu,et al.  Integrating independent component analysis and support vector machine for multivariate process monitoring , 2010, Comput. Ind. Eng..

[4]  Peng Zhou,et al.  A sequential feature extraction approach for naïve bayes classification of microarray data , 2009, Expert Syst. Appl..

[5]  T. Kauranne Feature selection using Fuzzy Entropy measures with Yu's Similarity measure , 2012 .

[6]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[7]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[8]  Xing-Ming Zhao,et al.  Gene Expression Data Classification Using Consensus Independent Component Analysis , 2008, Genom. Proteom. Bioinform..

[9]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[10]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[12]  H. Gunshin,et al.  A review of independent component analysis application to microarray gene expression data. , 2008, BioTechniques.

[13]  Abhilash Mohan,et al.  Automatic classification of protein structures using physicochemical parameters , 2014, Interdisciplinary Sciences: Computational Life Sciences.

[14]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[15]  Sounak Chakraborty,et al.  A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data , 2011, Comput. Stat. Data Anal..

[16]  Tina R. Patil,et al.  Performance Analysis of Naive Bayes and J 48 Classification Algorithm for Data Classification , 2013 .

[17]  Ganesh R. Naik,et al.  An Overview of Independent Component Analysis and Its Applications , 2011, Informatica.

[18]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[19]  Harry Zhang,et al.  Exploring Conditions For The Optimality Of Naïve Bayes , 2005, Int. J. Pattern Recognit. Artif. Intell..

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  Martin Dugas,et al.  Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data , 2010, BMC Bioinformatics.

[22]  R. Sandberg,et al.  Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. , 2001, Genome research.

[23]  Kam-Wah Tsui,et al.  A Bayesian classification method for treatments using microarray gene expression data , 2022 .

[24]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[25]  Jesmin Nahar,et al.  Microarray data classification using automatic SVM kernel selection. , 2007, DNA and cell biology.

[26]  Ewaryst J. Tkacz,et al.  Feature extraction based on time-frequency and Independent Component Analysis for improvement of separation ability in Atrial Fibrillation detector , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  Kayvan Najarian,et al.  Maximum Likelihood Estimation; Applications in Analysis of Biomedical Signals and Images , 2006 .

[28]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[29]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[30]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[31]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[32]  Xin Yao,et al.  Feature Selection for Microarray Data Using Least Squares SVM and Particle Swarm Optimization , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[33]  Pasi Luukka,et al.  Feature selection using fuzzy entropy measures with similarity classifier , 2011, Expert Syst. Appl..

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[36]  Luis M. de Campos,et al.  Bayesian networks classifiers for gene-expression data , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[37]  Alexander Statnikov,et al.  A comprehensive evaluation of multicategory classification methods for microbiomic data , 2013, Microbiome.

[38]  Om Parkash,et al.  Applications of Trigonometic Measures of Fuzzy Entropy to Geometry , 2010 .

[39]  María Victoria Rodellar Biarge,et al.  Robust Preprocessing of Gene Expression Microarrays for Independent Component Analysis , 2006, ICA.

[40]  Peng Zhou,et al.  Partition-conditional ICA for Bayesian classification of microarray data , 2010, Expert Syst. Appl..

[41]  Li Shang,et al.  Feature selection in independent component subspace for microarray data classification , 2006, Neurocomputing.

[42]  Yonghong Peng,et al.  A novel ensemble machine learning for robust microarray data classification , 2006, Comput. Biol. Medicine.

[43]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[44]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[45]  Thomas Villmann,et al.  Mathematical Aspects of Neural Networks , 2003, ESANN.

[46]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[47]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[48]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[49]  Hui-Ling Huang,et al.  ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data , 2007, Biosyst..

[50]  Enrico Capobianco,et al.  Exploration and reduction of high dimensional spaces with independent component analysis , 2004 .