Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also suffers from two major deficiencies: (1) it generates a very large number of association rules, which leads to high processing overhead; and (2) its confidence-based rule evaluation measure may lead to overfitting. In comparison with associative classification, traditional rule-based classifiers, such as C4.5, FOIL and RIPPER, are substantially faster but their accuracy, in most cases, may not be as high. In this paper, we propose a new classification approach, CPAR (Classification based on Predictive Association Rules), which combines the advantages of both associative classification and traditional rule-based classification. Instead of generating a large number of candidate rules as in associative classification, CPAR adopts a greedy algorithm to generate rules directly from training data. Moreover, CPAR generates and tests more rules than traditional rule-based classifiers to avoid missing important rules. To avoid overfitting, CPAR uses expected accuracy to evaluate each rule and uses the best k rules in prediction.
[1]
JOHANNES GEHRKE,et al.
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
,
1998,
Data Mining and Knowledge Discovery.
[2]
Ramakrishnan Srikant,et al.
Fast algorithms for mining association rules
,
1998,
VLDB 1998.
[3]
R. Mike Cameron-Jones,et al.
FOIL: A Midterm Report
,
1993,
ECML.
[4]
Jian Pei,et al.
CMAR: accurate and efficient classification based on multiple class-association rules
,
2001,
Proceedings 2001 IEEE International Conference on Data Mining.
[5]
Wynne Hsu,et al.
Integrating Classification and Association Rule Mining
,
1998,
KDD.
[6]
Peter Clark,et al.
Rule Induction with CN2: Some Recent Improvements
,
1991,
EWSL.
[7]
Jian Pei,et al.
Mining frequent patterns without candidate generation
,
2000,
SIGMOD '00.
[8]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[9]
William W. Cohen.
Fast Effective Rule Induction
,
1995,
ICML.
[10]
Aiko M. Hormann,et al.
Programs for Machine Learning. Part I
,
1962,
Inf. Control..