PERFICT: Perturbed Frequent Itemset Based Classification Technique

This paper presents Perturbed Frequent Itemset based Classification Technique (PERFICT), a novel associative classification approach based on perturbed frequent itemsets. Most of the existing associative classifiers work well on transactional data where each record contains a set of boolean items. They are not very effective in general for relational data that typically contains real valued attributes. In PERFICT, we handle real attributes by treating items as (attribute, value) pairs, where the value is not the original one, but is perturbed by a small amount and is a range based value. We also propose our own similarity measure which captures the nature of real valued attributes and provide effective weights for the itemsets. The probabilistic contributions of different itemsets is taken into considerations during classification. Some of the applications where such a technique is useful are in signal classification, medical diagnosis and handwriting recognition. Experiments conducted on the UCI Repository datasets show that PERFICT is highly competitive in terms of accuracy in comparison with popular associative classification methods.

[1]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[2]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[3]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[4]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[5]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..

[6]  Nimrod Megiddo,et al.  Discovering Predictive Association Rules , 1998, KDD.

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Yasuhiko Morimoto,et al.  Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization , 1996, SIGMOD '96.

[9]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[13]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[14]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[15]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Yasuhiko Morimoto,et al.  Mining Optimized Association Rules for Numeric Attributes , 1999, J. Comput. Syst. Sci..

[17]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[19]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[20]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[21]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[22]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[23]  Yasuhiko Morimoto,et al.  Data Mining with optimized two-dimensional association rules , 2001, TODS.