An exponent weighted algorithm for minimal cost feature selection

Minimal cost feature selection plays a crucial role in cost-sensitive learning. It aims to determine a feature subset for minimizing the average total cost by considering the trade-off between test costs and misclassification costs. Recently, a backtracking algorithm has been developed to tackle this problem. Unfortunately, the efficiency of the algorithm for large datasets is often unacceptable. Moreover, the run time of this algorithm significantly increases with the rise of misclassification costs. In this paper, we develop an exponent weighted algorithm for minimal cost feature selection, and the exponent weighted function of feature significance is constructed to increase the efficiency of the algorithm. The exponent weighted function is based on the information entropy, test cost, and a user-specified non-positive exponent. The effectiveness of our algorithm is demonstrated on six UCI datasets with two representative test cost distributions. Compared with the existing backtracking algorithm, the proposed algorithm significantly increases efficiency without being influenced by the misclassification cost setting.

[1]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[2]  Yong Xu,et al.  Sparse group LASSO based uncertain feature selection , 2013, International Journal of Machine Learning and Cybernetics.

[3]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[4]  William Zhu,et al.  Minimal Cost Attribute Reduction through Backtracking , 2011, FGIT-DTA/BSBT.

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Yiyu Yao,et al.  Attribute reduction in decision-theoretic rough set models , 2008, Inf. Sci..

[7]  Yiyu Yao,et al.  On Reduct Construction Algorithms , 2006, Trans. Comput. Sci..

[8]  Yu-Lin He,et al.  Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes , 2014, IEEE Transactions on Cybernetics.

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[11]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[14]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[15]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[16]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[17]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[18]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[19]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[20]  Julita Vassileva,et al.  Bayesian network-based trust model , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[23]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[24]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[25]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[26]  Yung C. Shin,et al.  A variational Bayesian framework for group feature selection , 2012, International Journal of Machine Learning and Cybernetics.

[27]  Christian Osendorfer,et al.  Minimizing data consumption with sequential online feature selection , 2013, Int. J. Mach. Learn. Cybern..

[28]  Fan Min,et al.  A hierarchical model for test-cost-sensitive decision systems , 2009, Inf. Sci..

[29]  Xizhao Wang,et al.  Maximum Ambiguity-Based Sample Selection in Fuzzy Decision Tree Induction , 2012, IEEE Transactions on Knowledge and Data Engineering.