A fast meta-heuristic approach for the $$(\alpha ,\beta )-k$$(α,β)-k-feature set problem

The feature selection problem aims to choose a subset of a given set of features that best represents the whole in a particular aspect, preserving the original semantics of the variables on the given samples and classes. In 2004, a new approach to perform feature selection was proposed. It was based on a NP-complete combinatorial optimisation problem called ($$\alpha ,\beta $$α,β)-k-feature set problem. Although effective for many practical cases, which made the approach an important feature selection tool, the only existing solution method, proposed on the original paper, was found not to work well for several instances. Our work aims to cover this gap found on the literature, quickly obtaining high quality solutions for the instances that existing approach can not solve. This work proposes a heuristic based on the greedy randomised adaptive search procedure and tabu search to address this problem; and benchmark instances to evaluate its performance. The computational results show that our method can obtain high quality solutions for both real and the proposed artificial instances and requires only a fraction of the computational resources required by the state of the art exact and heuristic approaches which use mixed integer programming models.

[1]  D. Maraganore,et al.  A Genomic Pathway Approach to a Complex Disease: Axon Guidance and Parkinson Disease , 2007, PLoS genetics.

[2]  Mario Inostroza-Ponta,et al.  A Transcription Factor Map as Revealed by a Genome-Wide Gene Expression Analysis of Whole-Blood mRNA Transcriptome in Multiple Sclerosis , 2010, PloS one.

[3]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[4]  Regina Berretta,et al.  The Electronic Primaries: Predicting the U.S. Presidency Using Feature Selection with Safe Data Reduction , 2005, ACSC.

[5]  M. Becich,et al.  Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process , 2007, BMC Cancer.

[6]  L. Almasy,et al.  Open Access Research Article Transcriptomic Epidemiology of Smoking: the Effect of Smoking on Gene Expression in Lymphocytes , 2022 .

[7]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[8]  Regina Berretta,et al.  Selection of Discriminative Genes in Microarray Experiments Using Mathematical Programming , 2007, J. Res. Pract. Inf. Technol..

[9]  Pablo Moscato,et al.  Identification of a 5-Protein Biomarker Molecular Signature for Predicting Alzheimer's Disease , 2008, PloS one.

[10]  J. Growdon,et al.  Molecular markers of early Parkinson's disease based on gene expression in blood , 2007, Proceedings of the National Academy of Sciences.

[11]  Regina Berretta,et al.  Novel Biomarkers for Prostate Cancer Revealed by (α, β)-k-Feature Sets , 2009, Foundations of Computational Intelligence.

[12]  A. Butte,et al.  Microarrays for an Integrative Genomics , 2002 .

[13]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[14]  A. Holland,et al.  Gene expression profiling in the adult Down syndrome brain. , 2007, Genomics.

[15]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[16]  R. Tibshirani,et al.  Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins , 2007, Nature Medicine.

[17]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[18]  Thierry Benoist,et al.  LocalSolver 1.x: a black-box local-search solver for 0-1 programming , 2011, 4OR.

[19]  Stuart J. Russell,et al.  NP-Completeness of Searches for Smallest Possible Feature Sets , 1994 .

[20]  Pablo Moscato,et al.  Evolutionary Search of Thresholds for Robust Feature Set Selection: Application to the Analysis of Microarray Data , 2004, EvoWorkshops.

[21]  Regina Berretta,et al.  Integer Programming Models and Algorithms for Molecular Classification of Cancer from Microarray Data , 2005, ACSC.

[22]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[23]  O. Rosso,et al.  Uncovering Molecular Biomarkers That Correlate Cognitive Decline with the Changes of Hippocampus' Gene Expression Profiles in Alzheimer's Disease , 2010, PloS one.

[24]  Regina Berretta,et al.  Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): A combinatorial optimization approach for electrode selection , 2009, Journal of Neuroscience Methods.

[25]  P. Moscato,et al.  Differences in Abundances of Cell-Signalling Proteins in Blood Reveal Novel Biomarkers for Early Detection Of Clinical Alzheimer's Disease , 2011, PloS one.

[26]  Mario Inostroza-Ponta,et al.  Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms , 2011, ICA3PP.

[27]  Regina Berretta,et al.  Combinatorial optimization models for finding genetic signatures from gene expression datasets. , 2008, Methods in molecular biology.

[28]  Pablo Moscato,et al.  Microarrays--identifying molecular portraits for prostate tumors with different Gleason patterns. , 2008, Methods in molecular medicine.

[29]  Pablo Moscato,et al.  The k-FEATURE SET problem is W[2]-complete , 2003, J. Comput. Syst. Sci..