论文信息 - Efficient Frequent Pattern Mining in Relational Databases

Efficient Frequent Pattern Mining in Relational Databases

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (PROjection PAttern Discovery). Propad fundamentally differs from an Apriorilike candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [Sarawagi et al., 1998] and SQL based FP-tree approach proposed in [Shang et al., 2004]. The experimental results show that our algorithm can get efficient performance.

Kai-Uwe Sattler | Ingolf Geist | Xuequn Shang

[1] Sunita Sarawagi,et al. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[2] Kai-Uwe Sattler,et al. SQL based frequent pattern mining without candidate generation , 2004, SAC '04.

[3] Xindong Wu,et al. Association analysis with one scan of databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[5] Ralf Rantzau,et al. Processing frequent itemset discovery queries by division and set containment join operators , 2003, DMKD '03.

[6] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7] Kyuseok Shim,et al. Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[8] Kai-Uwe Sattler,et al. SQL database primitives for decision tree classifiers , 2001, CIKM '01.

[9] Sharma Chakravarthy,et al. Performance Evaluation and Optimization of Join Queries for Association Rule Mining , 1999, DaWaK.

[10] Giuseppe Psaila,et al. A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[11] Arun N. Swami,et al. Set-Oriented Data Mining in relational Databases , 1995, Data Knowl. Eng..

[12] Masaru Kitsuregawa,et al. Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison with Directly Coded C Implementation , 1999, PAKDD.

[13] Masaru Kitsuregawa,et al. SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UBD EEE) , 2000, DaWaK.