IMine: Index Support for Item Set Mining

This paper presents the IMine index, a general and compact structure which provides tight integration of item set extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different item set extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large datasets. Item set mining supported by the IMine index shows performance always comparable with, and sometimes better than, state of the art algorithms accessing data on flat file.

[1]  Andrea Pietracaprina,et al.  Mining Frequent Itemsets using Patricia Tries , 2003, FIMI.

[2]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[3]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[6]  Beng Chin Ooi,et al.  Efficient indexing structures for mining frequent patterns , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[8]  Gao Cong,et al.  Speed-up iterative frequent itemset mining with constraint changes , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[12]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[13]  Laks V. S. Lakshmanan,et al.  Exploiting succinct constraints using FP-trees , 2002, SKDD.

[14]  Hongjun Lu,et al.  On computing, storing and querying frequent patterns , 2003, KDD '03.

[15]  Osmar R. Zaïane,et al.  Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[16]  Guido Moerkotte,et al.  Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing , 1998, VLDB.

[17]  Kyuseok Shim,et al.  Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[18]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[19]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[20]  Shalom Tsur,et al.  Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach , 1999, NGITS.

[21]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[23]  Sunita Sarawagi,et al.  Integrating Mining with Relational Database Systems: Alternatives and Implications. , 1998, SIGMOD 1998.

[24]  Surajit Chaudhuri,et al.  Efficient evaluation of queries with mining predicates , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[26]  Elena Baralis,et al.  Index support for frequent itemset mining in a relational DBMS , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[28]  Ganesh Ramesh,et al.  Indexing and Data Access Methods for Database Mining , 2002, DMKD.

[29]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[30]  Wei Wang,et al.  DMQL: A Data Mining Query Language for Relational Databases , 2007 .

[31]  Jean-François Boulicaut,et al.  A Comparison between Query Languages for the Extraction of Association Rules , 2002, DaWaK.

[32]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.