论文信息 - Linear-Time Rule Induction

Linear-Time Rule Induction

The recent emergence of data mining as a major application of machine learning has led to increased interest in fast rule induction algorithms. These are able to efficiently process large numbers of examples, under the constraint of still achieving good accuracy. If e is the number of examples, many rule learners have O(e4) asymptotic time complexity in noisy domains, and C4.5RULES has been empirically observed to sometimes require O(e3). Recent advances have brought this bound down to O(elog2 e), while maintaining accuracy at the level of C4.5RULES's. In this paper we present CWS, a new algorithm with guaranteed O(e) complexity, and verify that it outperforms C4.5RULES and CN2 in time, accuracy and output size on two large datasets. For example, on NASA's space shuttle database, running time is reduced from over a month (for C4.5RULES) to a few hours, with a slight gain in accuracy. CWS is based on interleaving the induction of all the rules and evaluating performance globally instead of locally (i.e., it uses a "conquering without separating" strategy as opposed to a "separate and conquer" one). Its bias is appropriate to domains where the underlying concept is simple and the data is plentiful but noisy.

Pedro M. Domingos

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Sholom M. Weiss,et al. Optimizing the Predictive Value of Diagnostic Decision Rules , 1987, AAAI.

[3] Michael J. Pazzani,et al. An Investigation of Noise-Tolerant Relational Concept Learning Algorithms , 1991, ML.

[4] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[5] Fenguangzhai Song. CD , 1992 .

[6] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[7] Johannes Fürnkranz,et al. Incremental Reduced Error Pruning , 1994, ICML.

[8] Ramasamy Uthurusamy,et al. Proceedings of the First International Conference on Knowledge Discovery and Data Mining , 1995, KDD 1995.

[9] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[10] William W. Cohen. Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems , 1993, IJCAI.

[11] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[12] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[13] Padhraic Smyth,et al. A Hybrid Rule-Based/Bayesian Classifier , 1990, ECAI.

[14] Robert C. Holte,et al. Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[15] Peter Clark,et al. Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[16] Oren Etzioni,et al. Learning Decision Lists Using Homogeneous Rules , 1994, AAAI.