Feature selection using hybrid poor and rich optimization algorithm for text classification

Abstract In order to reduce the high dimensional feature space in the text classification, feature selection plays a significant role. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of an optimal combination of features is an essential task in text classification. In this paper, the proposed work introduces a novel hybrid feature selection method based on binary poor and rich optimization algorithm (HBPRO) to obtain the appropriate subset of optimal features. The optimal feature subset which is selected by our proposed work is evaluated using Nave Bayes classifier with two popular benchmark text corpus datasets. The experimental results confirm that the proposed feature selection scheme (HBPRO) produces higher accuracy with a reduced number of features when compared with other feature selection techniques.

[1]  Haruna Chiroma,et al.  Machine learning for email spam filtering: review, approaches and open research problems , 2019, Heliyon.

[2]  Hema Banati,et al.  Fire Fly Based Feature Selection Approach , 2011 .

[3]  Swagatam Das,et al.  Simultaneous feature selection and weighting - An evolutionary multi-objective optimization approach , 2015, Pattern Recognit. Lett..

[4]  Zhen Liu,et al.  A new feature selection algorithm based on binomial hypothesis testing for spam filtering , 2011, Knowl. Based Syst..

[5]  Vahid Khatibi Bardsiri,et al.  Poor and rich optimization algorithm: A new human-based and multi populations algorithm , 2019, Eng. Appl. Artif. Intell..

[6]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[7]  Shingo Kuroiwa,et al.  Category Classification and Topic Discovery of Japanese and English News Articles , 2006, MFCSIT.

[8]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[9]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[10]  Guozhong Feng,et al.  A probabilistic model derived term weighting scheme for text classification , 2018, Pattern Recognit. Lett..

[11]  Ajit Danti,et al.  Classification of text documents based on score level fusion approach , 2017, Pattern Recognit. Lett..

[12]  Yao Zhang,et al.  Feature Selection Based on Term Frequency Reordering of Document Level , 2018, IEEE Access.

[13]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[14]  Abdur Rehman,et al.  Feature selection based on a normalized difference measure for text classification , 2017, Inf. Process. Manag..

[15]  Deqing Wang,et al.  t-Test feature selection approach based on term frequency for text categorization , 2014, Pattern Recognit. Lett..

[16]  Jun Tan,et al.  A multi-feature selection approach for gender identification of handwriting based on kernel mutual information , 2019, Pattern Recognit. Lett..

[17]  Serkan Gunal Hybrid feature selection for text classification , 2012 .

[18]  Mohamed Limam,et al.  A hybrid feature selection method based on instance learning and cooperative subset search , 2016, Pattern Recognit. Lett..

[19]  Daniela Moctezuma,et al.  A Simple Approach to Multilingual Polarity Classification in Twitter , 2016, Pattern Recognit. Lett..

[20]  Mohamed Morchid,et al.  Feature selection using Principal Component Analysis for massive retweet detection , 2014, Pattern Recognit. Lett..

[21]  U. Rajendra Acharya,et al.  Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach , 2020, Pattern Recognit. Lett..

[22]  Geraldo Xexéo,et al.  Buzzword detection in the scientific scenario , 2016, Pattern Recognit. Lett..

[23]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[24]  Wenqian Shang,et al.  A novel feature selection algorithm for text categorization , 2007, Expert Syst. Appl..

[25]  Souad Larabi Marie-Sainte,et al.  Firefly Algorithm based Feature Selection for Arabic Text Classification , 2020, J. King Saud Univ. Comput. Inf. Sci..

[26]  Jianhua Guo,et al.  Feature subset selection using naive Bayes for text classification , 2015, Pattern Recognit. Lett..

[27]  Dae-Won Kim,et al.  Optimization approach for feature selection in multi-label classification , 2017, Pattern Recognit. Lett..

[28]  Harry Wechsler,et al.  Spam detection using Random Boost , 2012, Pattern Recognit. Lett..

[29]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[30]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..