REDUCTION OF NUMBER OF ASSOCIATION RULES WITH INTER ITEMSET DISTANCE IN TRANSACTION DATABASES

Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified threshold values of support and confidence.The interestingness of Association Rules are determined by these two measures. However, other measures of interestingness like lift and conviction are also used. But, there occurs an explosive growth of discovered association rules and many of such rules are insignificant. In this paper we introduce a new measure of interestingness called Inter Itemset Distance or Spread and implemented this notion based on the approaches of the apriori algorithm with a view to reduce the number of discovered Association Rules in a meaningful manner. An analysis of the working of the new algorithm is done and the results are presented and compared with the results of conventional apriori algorithm.

[1]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[2]  Fabrice Guillet,et al.  Knowledge-Based Interactive Postmining of Association Rules Using Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  R. Bone Discovery , 1938, Nature.

[4]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[6]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[7]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[8]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[9]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[11]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[12]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[13]  Fabrice Guillet,et al.  Quality Measures in Data Mining , 2009, Studies in Computational Intelligence.

[14]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[15]  J ZakiMohammed,et al.  Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure , 2005 .

[16]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[17]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[18]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[19]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[20]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[21]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[22]  D UllmanJeffrey,et al.  Dynamic itemset counting and implication rules for market basket data , 1997 .

[23]  Arbee L. P. Chen,et al.  An efficient approach to discovering knowledge from large databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[24]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[25]  Limsoon Wong,et al.  DATA MINING TECHNIQUES , 2003 .

[26]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[27]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[28]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[29]  Ming-Syan Chen,et al.  A General Model for Sequential Pattern Mining with a Progressive Database , 2008, IEEE Transactions on Knowledge and Data Engineering.

[30]  G. Clark,et al.  Reference , 2008 .

[31]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[32]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Rui Chang,et al.  Quantitative Inference by Qualitative Semantic Knowledge Mining with Bayesian Model Averaging , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[35]  George Karypis,et al.  LPMiner: an algorithm for finding frequent itemsets using length-decreasing support constraint , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[36]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[37]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[38]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[39]  Kurt Hornik,et al.  Selective association rule generation , 2008, Comput. Stat..

[40]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[41]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[42]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[43]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[44]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[45]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[46]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[47]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[48]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[49]  Ke Sun,et al.  Mining Weighted Association Rules without Preassigned Weights , 2008, IEEE Transactions on Knowledge and Data Engineering.

[50]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[51]  Ming-Syan Chen,et al.  Hardware-Enhanced Association Rule Mining with Hashing and Pipelining , 2008, IEEE Transactions on Knowledge and Data Engineering.

[52]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[53]  Wen-Yang Lin,et al.  Efficient mining of generalized association rules with non-uniform minimum support , 2007, Data Knowl. Eng..

[54]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[55]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[56]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[57]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[58]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[59]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.