RFCBF: enhance the performance and stability of Fast Correlation-Based Filter

Feature selection is a preprocessing step which plays a crucial role in the domain of machine learning and data mining. Feature selection methods have been shown to be effective in removing redundant and irrelevant features, improving the learning algorithm’s prediction performance. Among the various methods of feature selection based on redundancy, the fast correlation-based filter (FCBF) is one of the most effective. In this paper, we proposed a novel extension of FCBF, called RFCBF, which combines resampling technique to improve classification accuracy. We performed comprehensive experiments to compare the RFCBF with other stateof-the-art feature selection methods using the KNN classifier on 12 publicly available data sets. The experimental results show that the RFCBF algorithm yields significantly better results than previous state-of-the-art methods in terms of classification accuracy and runtime.

[1]  Manu Vardhan,et al.  A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory , 2019, Int. J. Comput. Intell. Appl..

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[4]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[5]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[6]  Li Zhang,et al.  Feature clustering based support vector machine recursive feature elimination for gene selection , 2018, Applied Intelligence.

[7]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[8]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[9]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Guo-Zheng Li,et al.  Gene selection by using an improved Fast Correlation-Based Filter , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[11]  Jaime Lloret,et al.  Intelligent IoT Traffic Classification Using Novel Search Strategy for Fast-Based-Correlation Feature Selection in Industrial Environments , 2018, IEEE Internet of Things Journal.

[12]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[13]  Ram Sarkar,et al.  Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods , 2018, Medical & Biological Engineering & Computing.

[14]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[15]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[16]  Lei Yu,et al.  Fast Correlation Based Filter (FCBF) with a different search strategy , 2008, 2008 23rd International Symposium on Computer and Information Sciences.