A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: A case study

Abstract High dimensional search space in microarray data with large number of genes and few dozen of samples increases the complexity of analysis of such databases. All the genes are not significant and hence informative genes are required to be extracted. So dimension reduction is necessary for this process. It is often found in literature that the ranking approaches are used for feature selection. Different ranking techniques may assign different rank to the same gene and the selection made based on these ranks may not be suitable for different problems. So use of one ranking technique may lead to rejection of some important genes and possibly selection of some insignificant genes. Such selection may degrade the performance of the classifier. To overcome this problem, here a bi-objective ranked based Pareto front technique is proposed. In this technique using two ranked based technique the Pareto optimal solution is generated with a set of features. For the experimental work, 21 models based on 7 feature ranking strategies are considered. Eight different microarray data are taken to find the suitable ranking combination for the work. A grading method is used to rank the models and statistical test is performed to validate the findings.

[1]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[2]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[3]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[4]  Hua Wang,et al.  Combined Gene Selection Methods for Microarray Data Analysis , 2006, KES.

[5]  Francisco Escolano,et al.  Information-theoretic selection of high-dimensional spectral features for structural recognition , 2013, Comput. Vis. Image Underst..

[6]  Ahmed Salem Dina,et al.  MGS-CM: A Multiple Scoring Gene Selection Technique for Cancer Classification using Microarrays , 2011 .

[7]  Sung-Bae Cho,et al.  Efficient Microarray Data Classification with Three-Stage Dimensionality Reduction , 2015, ICIC 2015.

[8]  S. Omatu,et al.  Multi-objective optimization using genetic algorithm for gene selection from microarray data , 2008, 2008 International Conference on Computer and Communication Engineering.

[9]  Chen-An Tsai,et al.  Testing for differentially expressed genes with microarray data. , 2003, Nucleic acids research.

[10]  R. Shanmugalakshmi,et al.  Multi-Objective Firefly Algorithm for Multi-Class Gene Selection , 2015 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Yungho Leu,et al.  A novel hybrid feature selection method for microarray data analysis , 2011, Appl. Soft Comput..

[13]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Feng Yang,et al.  Robust Feature Selection for Microarray Data Based on Multicriterion Fusion , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[16]  G. Victo Sudha George,et al.  Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile , 2011, ArXiv.

[17]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[18]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[19]  Bijan Bihari Misra,et al.  Gene selection and classification of microarray data: A Pareto DE approach , 2017, Intell. Decis. Technol..

[20]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..

[21]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[22]  Huan Liu,et al.  Advancing feature selection research , 2010 .

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[25]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[26]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[27]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[28]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[29]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[30]  Massimo Pappalardo,et al.  Multiobjective Optimization: A Brief Overview , 2008 .

[31]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[33]  Ji-Xiang Du,et al.  Microarray data classification based on ensemble independent component selection , 2009, Comput. Biol. Medicine.

[34]  Mohammad Hossein Moattar,et al.  A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. , 2016, Genomics.

[35]  Carlos A. Coello Coello,et al.  An Introduction to Multi-Objective Evolutionary Algorithms and Some of Their Potential Uses in Biology , 2008, Applications of Computational Intelligence in Biology.

[36]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[37]  M. Tyers,et al.  Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. , 2002, Cancer research.

[38]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[39]  B. B. Misra,et al.  Reduction Combination Determination for Efficient Microarray Data Classification with Three Stage Dimensionality Reduction Approach , 2015 .

[40]  Bijan Bihari Misra,et al.  Pipelining the ranking techniques for microarray data classification: A case study , 2016, Appl. Soft Comput..

[41]  Luo Fei,et al.  Optimal Genes Selection with a New Multi-objective Evolutional Algorithm Hybriding NSGA-II with EDA , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[42]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[43]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[44]  Raymond Chiong,et al.  Why Is Optimization Difficult? , 2009, Nature-Inspired Algorithms for Optimisation.

[45]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[46]  Basabi Chakraborty,et al.  Multi-objective Optimization Using Pareto GA for Gene-Selection from Microarray Data for Disease Classification , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[47]  Yan Ma,et al.  Real-time feature selection in traffic classification , 2008 .

[48]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[49]  J. Xuan,et al.  Classification algorithms for phenotype prediction in genomics and proteomics. , 2008, Frontiers in bioscience : a journal and virtual library.

[50]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[51]  Ying Liu,et al.  A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification , 2007, Cancer informatics.

[52]  Chee Peng Lim,et al.  A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models , 2014, Neurocomputing.

[53]  Enrique Alba,et al.  Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis , 2009, Inf. Process. Lett..

[54]  Mohammad Sadegh Helfroush,et al.  A fuzzy multi-objective hybrid TLBO-PSO approach to select the associated genes with breast cancer , 2017, Signal Process..

[55]  Carlos A. Coello Coello,et al.  Current and Future Research Trends in Evolutionary Multiobjective Optimization , 2005 .