论文信息 - CPAR: Classification based on Predictive Association Rules

CPAR: Classification based on Predictive Association Rules

Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also suffers from two major deficiencies: (1) it generates a very large number of association rules, which leads to high processing overhead; and (2) its confidence-based rule evaluation measure may lead to overfitting. In comparison with associative classification, traditional rule-based classifiers, such as C4.5, FOIL and RIPPER, are substantially faster but their accuracy, in most cases, may not be as high. In this paper, we propose a new classification approach, CPAR (Classification based on Predictive Association Rules), which combines the advantages of both associative classification and traditional rule-based classification. Instead of generating a large number of candidate rules as in associative classification, CPAR adopts a greedy algorithm to generate rules directly from training data. Moreover, CPAR generates and tests more rules than traditional rule-based classifiers to avoid missing important rules. To avoid overfitting, CPAR uses expected accuracy to evaluate each rule and uses the best k rules in prediction.

Jiawei Han | Xiaoxin Yin | Jiawei Han | Xiaoxin Yin

[1] JOHANNES GEHRKE,et al. RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[2] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[3] R. Mike Cameron-Jones,et al. FOIL: A Midterm Report , 1993, ECML.

[4] Jian Pei,et al. CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[6] Peter Clark,et al. Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[7] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[9] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[10] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..