Feature selection algorithm recommendation for gene expression data through gradient boosting and neural network metamodels

Feature selection is an important step in gene expression data analysis. However, many feature selection methods exist and a costly experimentation is usually needed to determine the most suitable one for a given problem. This paper presents the application of gradient boosting and neural network techniques for the construction of metamodels that can recommend rankings of {feature selection - classification} algorithm pairs for new gene expression classification problems. Results in a corpus of 60 public data sets show the superiority of these techniques in producing more useful rankings in relation to classical metamodels.

[1]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[2]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[3]  Weinan Zhang,et al.  LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates , 2016, CIKM.

[4]  Joaquin Vanschoren,et al.  Algorithm Selection via Meta-learning and Sample-based Active Testing , 2015, MetaSel@PKDD/ECML.

[5]  W. Marsden I and J , 2012 .

[6]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Empirical Evaluation of Ranking Prediction Methods for Gene Expression Data Classification , 2010, IBERAMIA.

[7]  Boris Delibasic,et al.  Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures , 2016, Int. J. Data Min. Bioinform..

[8]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[9]  João Mendes-Moreira,et al.  Towards Automatic Generation of Metafeatures , 2016, PAKDD.

[10]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[11]  Pavel Brazdil,et al.  Metalearning and Algorithm Selection: progress, state of the art and introduction to the 2018 Special Issue , 2017, Machine Learning.

[12]  Abhishek Bhola and Shailendra Singh,et al.  Gene Selection Using High Dimensional Gene Expression Data: An Appraisal , 2016 .

[13]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Meta-learning approach to gene expression data classification , 2009, Int. J. Intell. Comput. Cybern..