An improved data mining approach using predictive itemsets

In this paper, we present a mining algorithm to improve the efficiency of finding large itemsets. Based on the concept of prediction proposed in the (n,p) algorithm, our method considers the data dependency in the given transactions to predict promising and non-promising candidate itemsets. Our method estimates for each level a different support threshold that is derived from a data dependency parameter and determines whether an item should be included in a promising candidate itemset directly. In this way, we maintain the efficiency of finding large itemsets by reducing the number of scanning the input dataset and the number candidate items. Experimental results show our method has a better efficiency than the apriori and the (n,p) algorithms when the minimum support value is small.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[3]  Christophe Rigotti,et al.  A condensed representation to find frequent patterns , 2001, PODS '01.

[4]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[5]  Janusz R. Getta,et al.  A parameterised algorithm for mining association rules , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[6]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[7]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[8]  Jia-Ling Koh,et al.  An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures1 , 2004, DASFAA.

[9]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[10]  Osmar R. Zaïane,et al.  COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation , 2003, FIMI.

[11]  Yong Qiu,et al.  An improved algorithm of mining from FP-tree , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[12]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Marek Wojciechowski,et al.  Dataset Filtering Techniques in Constraint-Based Frequent Pattern Mining , 2002, Pattern Detection and Discovery.

[15]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[16]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[17]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[18]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[19]  Christie I. Ezeife,et al.  Mining Incremental Association Rules with Generalized FP-Tree , 2002, Canadian Conference on AI.

[20]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[22]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.