Strength pareto evolutionary algorithm based gene subset selection

Microarray gene expression data is voluminous and very few genes in the dataset are informative for disease analysis. Selecting those genes from the whole dataset is a very challenging task. There are many optimization techniques used by the researchers for gene subset selection but none of them provides global optimal solution for all gene datasets. In the paper, we have proposed a strength pareto evolutionary algorithm based gene subset selection technique to select the informative gene subset for analyzing and identifying the disease efficiently. It is a multi-objective optimization algorithm that provides a non-dominated pareto front exploring the search space to obtain an optimal gene subset. The external cluster validation index and number of genes in a sample are considered as two objective functions of the algorithm and based on this two objective functions the chromosomes in the population are evaluated and after the convergence of the algorithm, chromosomes in the non-dominated pareto front gives the important gene subset. The experimental result on selected gene subset proves the usefulness of the method.

[1]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[2]  Asit Kumar Das,et al.  Gene Selection Using Multi-objective Genetic Algorithm Integrating Cellular Automata and Rough Set Theory , 2013, SEMCCO.

[3]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Diego H. Milone,et al.  A new index for clustering validation with overlapped clusters , 2016, Expert Syst. Appl..

[6]  Mohamed Nadif,et al.  Fast Simultaneous Clustering and Feature Selection for Binary Data , 2014, IDA.

[7]  Ming Yang,et al.  FCM_FS: A Simultaneous Clustering and Feature Selection Model for Classification , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[8]  J. Sil,et al.  Simultaneous continuous feature selection and K clustering by Multi Objective Genetic Algorithm , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[9]  V. Susheela Devi,et al.  Simultaneous Feature Selection and Clustering Using Particle Swarm Optimization , 2012, ICONIP.

[10]  Arinjoy Basak,et al.  A graph based Feature Selection algorithm utilizing attribute intercorrelation , 2016, 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).

[11]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[12]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..