A Study on the Relevance of Feature Selection Methods in Microarray Data

This paper studies the relevance of feature selection algorithms in microarray data for effective analysis. With no loss of generality, we present a list of feature selection algorithms and propose a generic categorizing framework that systematically groups algorithms into categories. The generic categorizing framework is based on search strategies and evaluation criteria. Further, it provides guidelines for selecting feature selection algorithms in general and in specific to the context of this study. In the context of microarray data analysis, the feature selection algorithms are classified into soft and non-soft computing categories. Their performance analysis with respect to microarray data analysis has been presented.

[1]  Turker Tekin Erguzel,et al.  A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders , 2015, Comput. Biol. Medicine.

[2]  Taghi M. Khoshgoftaar,et al.  Evaluation of Wrapper-Based Feature Selection Using Hard, Moderate, and Easy Bioinformatics Data , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.

[3]  Debahuti Mishra,et al.  A signal-to-noise classification model for identification of differentially expressed genes from gene expression data , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[4]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[5]  Sen Liang,et al.  A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis , 2018, Computational and structural biotechnology journal.

[6]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[7]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[8]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[9]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[10]  Lawrence O. Hall,et al.  Iterative Feature perturbation as a gene Selector for microarray Data , 2012, Int. J. Pattern Recognit. Artif. Intell..

[11]  Li-Yeh Chuang,et al.  A hybrid feature selection method for DNA microarray data , 2011, Comput. Biol. Medicine.

[12]  Jalil Heidary Dahooie,et al.  Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick , 2015, Expert Syst. Appl..

[13]  Lluís A. Belanche Muñoz,et al.  Gene subset selection in microarray data using entropic filtering for cancer classification , 2009, Expert Syst. J. Knowl. Eng..

[14]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[17]  Santanu Kumar Rath,et al.  Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor , 2015 .

[18]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[19]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[20]  Ping Zhang,et al.  Feature selection considering two types of feature relevancy and feature interdependency , 2018, Expert Syst. Appl..

[21]  Joe Naoum-Sawaya,et al.  High dimensional data classification and feature selection using support vector machines , 2018, Eur. J. Oper. Res..

[22]  Yamuna Prasad,et al.  SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design , 2010, ICSI.

[23]  Jia Ruijuan,et al.  Mechanical fault diagnosis and signal feature extraction based on fuzzy neural network , 2008, 2008 27th Chinese Control Conference.

[24]  A D Long,et al.  Improved Statistical Inference from DNA Microarray Data Using Analysis of Variance and A Bayesian Statistical Framework , 2001, The Journal of Biological Chemistry.

[25]  Mengjie Zhang,et al.  Feature selection based on PSO and decision-theoretic rough set model , 2013, 2013 IEEE Congress on Evolutionary Computation.

[26]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  Mario Marchand,et al.  Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[29]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[30]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[31]  Lori A. Dalton,et al.  Optimal Bayesian feature selection on high dimensional gene expression data , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[32]  Barnali Sahu,et al.  A Novel Feature Selection Algorithm using Particle Swarm Optimization for Cancer Microarray Data , 2012 .

[33]  Ali Anaissi,et al.  Feature Selection of Imbalanced Gene Expression Microarray Data , 2011, 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[34]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[35]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[36]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[37]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[38]  Marco Muselli,et al.  Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments , 2008, BMC Bioinformatics.

[39]  Aboul Ella Hassanien,et al.  New approach for feature selection based on rough set and bat algorithm , 2014, 2014 9th International Conference on Computer Engineering & Systems (ICCES).

[40]  Philippe Salembier,et al.  Microarray classification with hierarchical data representation and novel feature selection criteria , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[41]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[42]  Verónica Bolón-Canedo,et al.  An ensemble of filters and classifiers for microarray data classification , 2012, Pattern Recognit..

[43]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Pradipta Maji,et al.  Rough sets and support vector machine for selecting differentially expressed miRNAs , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[45]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[46]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[48]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[49]  Jian Pei,et al.  A rank sum test method for informative gene discovery , 2004, KDD.

[50]  Hsueh-Wei Chang,et al.  A two-stage feature selection method for gene expression data. , 2009, Omics : a journal of integrative biology.

[51]  Mario Cannataro,et al.  Challenges in microarray data management and analysis , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[52]  Vaidyanathan K. Jayaraman,et al.  Biogeography-based informative gene selection and cancer classification using SVM and Random Forests , 2012, 2012 IEEE Congress on Evolutionary Computation.

[53]  Yuming Zhou,et al.  Selecting feature subset for high dimensional data via the propositional FOIL rules , 2013, Pattern Recognit..

[54]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Armin Eberlein,et al.  Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing , 2009, Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[56]  Salwani Abdullah,et al.  Hybridizing relieff, mRMR filters and GA wrapper approaches for gene selection , 2012 .

[57]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[58]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[59]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[60]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[61]  Benny Y. M. Fung,et al.  Classification of heterogeneous gene expression data , 2003, SKDD.

[62]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[63]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[64]  Hua Wang,et al.  Combined Gene Selection Methods for Microarray Data Analysis , 2006, KES.

[65]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[66]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[67]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[68]  W. Fung,et al.  Detecting differentially expressed genes by relative entropy. , 2005, Journal of theoretical biology.

[69]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[70]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[71]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[72]  Daniel Q. Naiman,et al.  Statistical Applications in Genetics and Molecular Biology Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2011 .

[73]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[74]  David Casasent,et al.  An improvement on floating search algorithms for feature subset selection , 2009, Pattern Recognit..

[75]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[77]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[78]  Hui-Huang Hsu,et al.  Feature Selection for Cancer Classification on Microarray Expression Data , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[79]  Hui-Huang Hsu,et al.  Hybrid feature selection by combining filters and wrappers , 2011, Expert Syst. Appl..

[80]  F. Blattner,et al.  Functional Genomics: Expression Analysis ofEscherichia coli Growing on Minimal and Rich Media , 1999, Journal of bacteriology.

[81]  Barnali Sahu,et al.  Performance of Feed Forward Neural Network for a Novel Feature Selection Approach , 2011 .

[82]  Jack Y. Yang,et al.  Redundant Gene Selection Based on Particle Swarm Optimization , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[83]  Roger E Bumgarner,et al.  Correction: Multiclass classification of microarray data with repeated measurements: application to cancer , 2006, Genome Biology.

[84]  Alireza Osareh,et al.  Microarray data analysis for cancer classification , 2010, 2010 5th International Symposium on Health Informatics and Bioinformatics.

[85]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[86]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[87]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[88]  T. R. Sivapriya,et al.  Hybrid feature reduction and selection for enhanced classification of high dimensional medical data , 2013, 2013 IEEE International Conference on Computational Intelligence and Computing Research.

[89]  Debahuti Mishra,et al.  A novel approach for selecting informative genes from gene expression data using Signal-to-Noise Ratio and t-statistics , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[90]  Nadia Abd-Alsabour,et al.  Investigating the effect of fixing the subset length on the performance of ant colony optimization for feature selection for supervised learning , 2015, Comput. Electr. Eng..

[91]  Tomassini Marco,et al.  A Survey of Artificial Neural Network-Based Modeling in Agroecology , 2008, Soft Computing Applications in Industry.

[92]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[93]  Kazuyuki Murase,et al.  An Efficient Feature Selection Using Ant Colony Optimization Algorithm , 2009, ICONIP.

[94]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[95]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[96]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[97]  Sabu M. Thampi,et al.  PSO based feature selection for clustering gene expression data , 2015, 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).