Attribute Reduction with Test Cost Constraint

In many machine learning applications, data are not free, and there is a test cost for each data item. For the economical reason, some existing works try to minimize the test cost and at the same time, preserve a particular property of a given decision system. In this paper, we point out that the test cost one can afford is limited in some applications. Hence, one has to sacrifice respective properties to keep the test cost under a budget. To formalize this issue, we define the test cost constraint attribute reduction problem, where the optimization objective is to minimize the conditional information entropy. This problem is essentially a generalization of both the test-cost-sensitive attribute reduction problem and the 0-1 knapsack problem, therefore it is more challenging. We propose a heuristic algorithm based on the information gain and test costs to deal with the new problem. The algorithm is tested on four UCI (University of California - Irvine) datasets with various test cost settings. Experimental results indicate the appropriate setting of the only user- specified parameter λ.

[1]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[2]  Fan Min,et al.  Accumulated Cost Based Test-Cost-Sensitive Attribute Reduction , 2011, RSFDGrC.

[3]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[4]  Fan Min,et al.  A hierarchical model for test-cost-sensitive decision systems , 2009, Inf. Sci..

[5]  Yiyu Yao,et al.  Attribute reduction in decision-theoretic rough set models , 2008, Inf. Sci..

[6]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[7]  William Zhu,et al.  Topological approaches to covering rough sets , 2007, Inf. Sci..

[8]  Fei-Yue Wang,et al.  Reduction and axiomization of covering generalized rough sets , 2003, Inf. Sci..

[9]  G. Y. Wang Attribute Core of Decision Table , 2002, Rough Sets and Current Trends in Computing.

[10]  Zdzislaw Pawlak,et al.  Rough sets and intelligent data analysis , 2002, Inf. Sci..

[11]  Dominik Slezak,et al.  Approximate Entropy Reducts , 2002, Fundam. Informaticae.

[12]  Salvatore Greco,et al.  Variable Consistency Model of Dominance-Based Rough Sets Approach , 2000, Rough Sets and Current Trends in Computing.

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[16]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[17]  Kurt Hornik,et al.  FEED FORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS , 1989 .

[18]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[19]  G. B. Mathews On the Partition of Numbers , 1896 .