Windowing has been proposed as a procedure for eecient memory use in the ID3 decision tree learning algorithm. However, previous work has shown that windowing may often lead to a decrease in performance. In this work, we try to argue that separate-and-conquer rule learning algorithms are more appropriate for windowing than divide-and-conquer algorithms, because they learn rules independently and are less susceptible to changes in class distributions. In particular, we will present a new windowing algorithm that achieves additional gains in eeciency by exploiting this property of separate-and-conquer algorithms. While the presented algorithm is only suitable for redundant, noise-free data sets, we will also brieey discuss the problem of noisy data in windowing and present some preliminary ideas how it might be solved with an extension of the algorithm introduced in this paper.
[1]
William W. Cohen.
Fast Eeective Rule Induction
,
1995
.
[2]
CategorizationYiming YangSection.
Sampling Strategies and Learning Eeciency in Text Categorization
,
1996
.
[3]
Jason Catlett,et al.
Megainduction: A Test Flight
,
1991,
ML.
[4]
Johannes Fürnkranz,et al.
Incremental Reduced Error Pruning
,
1994,
ICML.
[5]
Padhraic Smyth,et al.
From Data Mining to Knowledge Discovery in Databases
,
1996,
AI Mag..
[6]
Jason Catlett,et al.
Experiments on the Costs and Benefits of Windowing in ID3
,
1988,
ML.
[7]
David D. Lewis,et al.
Heterogeneous Uncertainty Sampling for Supervised Learning
,
1994,
ICML.
[8]
Stephen Muggleton,et al.
An Experimental Comparison of Human and Machine Learning Formalisms
,
1989,
ML.
[9]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.