Generalized itemset discovery by means of opportunistic aggregation

Association rule extraction is a widely used explor at y technique which has been exploited in different contexts (e.g., network t affic characterization, biological data, medical images). However, associat ion rule extraction, driven by support and confidence constraints, entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruni ng rare itemsets, even if their hidden knowledge might be relevant. To address the above issues, this chapter presents a novel algorithm, called GenIO (GeNeralized Itemset Disc Overer), to analyze correlation among data by means of generali z d itemsets, which provide a powerful tool to efficiently extract hidd en knowledge, discarded by previous approaches. The proposed technique exploit s (user provided) taxonomies to drive the pruning phase of the extrac tion process. Instead of extracting itemsets for all levels of the taxonomy and post-pruning them, the GenIO algorithm performs a support driven opportunistic aggregation of itemsets. Generalized itemsets are extracted only i f items at a lower level in the taxonomy are below the support threshold. Experimen ts performed in the network traffic domain show the efficiency and the effectiveness of the proposed algorithm.

[1]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[2]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[5]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[6]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  Elena Baralis,et al.  Data mining techniques for effective and scalable traffic analysis , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[9]  Fabrizio Silvestri,et al.  kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  M. Kaya,et al.  Mining multi-cross-level fuzzy weighted association rules , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[12]  Jiawei Han,et al.  Discovery of multiple-level rules from large databases , 1996 .

[13]  Sunita Sarawagi,et al.  Mining Generalized Association Rules and Sequential Patterns Using SQL Queries , 1998, KDD.

[14]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[15]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[16]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[17]  Osmar R. Zaïane,et al.  Application of Data Mining Techniques for Medical Image Classification , 2001, MDM/KDD.

[18]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[19]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[20]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.