Feature selection based on an improved cat swarm optimization algorithm for big data classification

Feature selection, which is a type of optimization problem, is generally achieved by combining an optimization algorithm with a classifier. Genetic algorithms and particle swarm optimization (PSO) are two commonly used optimal algorithms. Recently, cat swarm optimization (CSO) has been proposed and demonstrated to outperform PSO. However, CSO is limited by long computation times. In this paper, we modify CSO to present an improved algorithm, ICSO. We then apply the ICSO algorithm to select features in a text classification experiment for big data. Results show that the proposed ICSO outperforms traditional CSO. For big data classification, the results show that using term frequency-inverse document frequency (TF-IDF) with ICSO for feature selection is more accurate than using TF-IDF alone.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[3]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[4]  Jason C. Hung,et al.  Feature Selection of Support Vector Machine Based on Harmonious Cat Swarm Optimization , 2014, 2014 7th International Conference on Ubi-Media Computing and Workshops.

[5]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[6]  Kuan-Cheng Lin,et al.  CSO-based feature selection and parameter optimization for support vector machine , 2009, 2009 Joint Conferences on Pervasive Computing (JCPC).

[7]  Mohammad Teshnehlab,et al.  A Novel Cat Swarm Optimization Algorithm for Unconstrained Optimization Problems , 2013 .

[8]  Jason C. Hung,et al.  Adaptive SVM-Based Classification Systems Based on the Improved Endocrine-Based PSO Algorithm , 2012, AMT.

[9]  Yi-Hung Huang,et al.  Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Cat Swarm Optimization , 2015, Int. J. Distributed Sens. Networks.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[14]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[15]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[16]  Kuan-Cheng Lin,et al.  Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms , 2015, Journal of Medical Systems.

[17]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[18]  Kuan-Cheng Lin,et al.  Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Artificial Fish Swarm Algorithms , 2015 .

[19]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Shu-Chuan Chu,et al.  COMPUTATIONAL INTELLIGENCE BASED ON THE BEHAVIOR OF CATS , 2007 .

[22]  Huan Liu,et al.  Advancing Feature Selection Research − ASU Feature Selection Repository , 2010 .

[23]  C. Tappert,et al.  A Genetic Algorithm for Constructing Compact Binary Decision Trees , 2009 .