Improving Classification Accuracy by Means of the Sliding Window Method in Consistency-Based Feature Selection

In the digital era, collecting relevant information of a technological process has become increasingly cheaper and easier. However, due to the huge available amount of data, supervised classification is one of the most challenging tasks within the artificial intelligence field. Feature selection solves this problem by removing irrelevant and redundant features from data. In this paper we propose a new feature selection algorithm called Swcfs, which works well in high-dimensional and noisy data. Swcfs can detect noisy features by leveraging the sliding window method over the set of consecutive features ranked according to their non-linear correlation with the class feature. The metric Swcfs uses to evaluate sets of features, with respect to their relevance to the class label, is the bayesian risk, which represents the theoretical upper error bound of deterministic classification. Experiments reveal Swcfs is more accurate than most of the state-of-the-art feature selection algorithms.

[1]  Kilho Shin,et al.  Fast and Accurate Steepest-Descent Consistency-Constrained Algorithms for Feature Selection , 2015, MOD.

[2]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[5]  Josef Schicho,et al.  A regularization approach for estimating the type of a plane curve singularity , 2013, Theor. Comput. Sci..

[6]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[7]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[8]  Gavin Lowe,et al.  Using data-independence in the analysis of intrusion detection systems , 2005, Theor. Comput. Sci..

[9]  Kilho Shin,et al.  Consistency-Based Feature Selection , 2009, KES.

[10]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[11]  Natasa Jonoska,et al.  Rewriting rule chains modeling DNA rearrangement pathways , 2012, Theor. Comput. Sci..

[12]  Tetsuji Kuboyama,et al.  2015 Ieee International Conference on Big Data (big Data) Super-cwc and Super-lcc: Super Fast Feature Selection Algorithms , 2022 .

[13]  Elisabetta De Maria,et al.  Design, optimization and predictions of a coupled model of the cell cycle, circadian clock, DNA repair system, irinotecan metabolism and exposure control under temporal logic constraints , 2011, Theor. Comput. Sci..

[14]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[15]  Daoqiang Zhang,et al.  Iterative Laplacian Score for Feature Selection , 2012, CCPR.

[16]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[17]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Kilho Shin,et al.  A Consistency-Constrained Feature Selection Algorithm with the Steepest Descent Method , 2009, MDAI.

[20]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[21]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.