Efficient Detection of Local Interactions in the Cascade Model

Detection of interactions among data items constitutes an essential part of knowledge discovery. The cascade model is a rule induction methodology using levelwise expansion of a lattice. It can detect positive and negative interactions using the sum of squares criterion for categorical data. An attribute-value pair is expressed as an item, and the BSS (between-groups sum of squares) value along a link in the itemset lattice indicates the strength of interaction among item pairs. A link with a strong interaction is represented as a rule. Items on the node constitute the left-hand side (LHS) of a rule, and the right-hand side (RHS) displays veiled items with strong interactions with the added item. This implies that we do not need to generate an itemset containing the RHS items to get a rule. This property enables effective rule induction. That is, rule links can be dynamically detected during the generation of a lattice. Furthermore, the BSS value of the added attribute gives an upper bound to those of other attributes along the link. This property gives us an effective pruning method for the itemset lattice. The method was implemented as the software DISCAS. There, the items to appear in the LHS and RHS are easily controlled by input parameters. Its algorithms are depicted and an application is provided as an illustrative example.

[1]  B. Margolin,et al.  An Analysis of Variance for Categorical Data , 1971 .

[2]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[3]  Hannu T. T. Toivonen,et al.  Samplinglarge databases for finding association rules , 1996, VLDB 1996.

[4]  Dimitris Meretakis,et al.  Classification as Mining and Use of Labeled Itemsets , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[5]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[6]  Takashi Okada,et al.  Rule Induction in Cascade Model Based on Sum of Squares Decomposition , 1999, PKDD.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  Takashi Okada Finding Discrimination Rules Using the Cascade Model , 2000 .

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[12]  Takashi Okada Sum of Squares Decomposition for Categorical Data , 1999 .

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Dimitrios Gunopulos,et al.  Workshop report: 2000 ACM SIGMOD workshop on research issues in data mining and knowledge discovery , 2000, SKDD.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.