This paper describes eecient methods for exact and approximate implementation of the MIN-FEATURES bias, which prefers consistent hypotheses deenable over as few features as possible. This bias is useful for learning domains where many irrelevant features are present in the training data. We rst introduce FOCUS-2, a new algorithm that exactly implements the MIN-FEATURES bias. This algorithm is empirically shown to be substantially faster than the FOCUS algorithm previously given in Al-muallim and Dietterich, 1991]. We then introduce the Mutual-Information-Greedy, Simple-Greedy and Weighted-Greedy algorithms, which apply eecient heuristics for approximating the MIN-FEATURES bias. These algorithms employ greedy heuristics that trade op-timality for computational eeciency. Experimental studies show that the learning performance of ID3 is greatly improved when these algorithms are used to preprocess the training data by eliminating the irrelevant features from ID3's consideration. In particular, the Weighted-Greedy algorithm provides an excellent and eecient approximation of the MIN-FEATURES bias.
[1]
MANABU ICHINO,et al.
Optimum feature selection by zero-one integer programming
,
1984,
IEEE Transactions on Systems, Man, and Cybernetics.
[2]
Anthony N. Mucciardi,et al.
A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties
,
1971,
IEEE Transactions on Computers.
[3]
Keinosuke Fukunaga,et al.
A Branch and Bound Algorithm for Feature Subset Selection
,
1977,
IEEE Transactions on Computers.
[4]
David Haussler,et al.
Learnability and the Vapnik-Chervonenkis dimension
,
1989,
JACM.
[5]
Hussein Almuallim,et al.
Concept coverage and its application to two learning tasks
,
1992
.
[6]
Thomas G. Dietterich,et al.
Learning with Many Irrelevant Features
,
1991,
AAAI.