Impact of Membership and Non-membership Features on Classification Decision: An Empirical Study for Appraisal of Feature Selection Methods

In text categorization, the discriminative power of classifiers, dataset characteristics, and construction of the more representative feature set play an important role in classification decisions. Subsequently, in text categorization, filter based feature selection methods are used rather than wrapper and embedded methods. In terms of construction of an illustrative feature set, a number of global and local filter based feature selection methods are used with their respective pros and cons. The inclusion and exclusion of membership and non-membership features in a constructed feature set depends on the discriminative power of the feature selection method. Though, there are few studies which have reported the impact of non-membership features on the classification decision. However, to best of our knowledge, there is no detail study, which calibrates the effectiveness of the feature selection method in terms of inclusion of non-membership features to improve the classification decisions. Consequently, in this paper, we conduct an empirical study to investigate the effectiveness of four well-known filter based feature selection methods, namely IG, $\chi 2$, RF, and DF. Subsequently, we perform a case study in the context of classification of the Gang-of-Four software design patterns. The results show that the balance consideration of membership and non-membership features has a positive impact on the performance of the classifier and classification decision can be improved. It has also been concluded that random forest is best among existing methods in considering an equal number of membership and non-membership features and the classifiers show better performance with this method as compare to others.

[1]  Design Pattern , 1998 .

[2]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[3]  Jianhua Guo,et al.  Feature subset selection using naive Bayes for text classification , 2015, Pattern Recognit. Lett..

[4]  Arif Ali Khan,et al.  A Methodology to Automate the Selection of Design Patterns , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[5]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[6]  Alper Kursat Uysal,et al.  An improved global feature selection scheme for text classification , 2016, Expert Syst. Appl..

[7]  Arif Ali Khan,et al.  Software design patterns classification and selection using text categorization approach , 2017, Appl. Soft Comput..

[8]  Kesari Verma,et al.  Variable Global Feature Selection Scheme for automatic classification of text documents , 2017, Expert systems with applications.

[9]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[10]  Mohammed Azmi Al-Betar,et al.  Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering , 2017, Expert Syst. Appl..

[11]  Jerzy Stefanowski,et al.  Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data , 2018, Journal of Intelligent Information Systems.

[12]  Shahid Hussain A methodology to predict the instable classes: student research abstract , 2017, SAC.

[13]  Kok-Leong Ong,et al.  Feature selection for high dimensional imbalanced class data using harmony search , 2017, Eng. Appl. Artif. Intell..

[14]  Chuangyin Dang,et al.  An Integrated Planning Approach Towards Home Health Care, Telehealth and Patients Group Based Care , 2018, J. Netw. Comput. Appl..

[15]  Awais Ahmad,et al.  Implications of deep learning for the automation of design patterns organization , 2017, J. Parallel Distributed Comput..