Support driven opportunistic aggregation for generalized itemset extraction

Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., biological data, medical images). However, association rule extraction, driven by support and confidence constraints, entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruning rare itemsets, even if their hidden knowledge might be relevant. To address the above issues, this paper presents a novel frequent itemset mining algorithm, called GENIO (GENeralized Itemset DiscOverer), to analyze correlation among data by means of generalized itemsets, which provide a powerful tool to efficiently extract hidden knowledge, discarded by previous approaches. The proposed technique exploits a (user provided) taxonomy to drive the pruning phase of the extraction process. Instead of extracting itemsets for all levels of the taxonomy and post-pruning them, the GenIO algorithm performs a support driven opportunistic aggregation of itemsets. Generalized itemsets are extracted only if itemsets at a lower level in the taxonomy are below the support threshold. Experiments performed in the network traffic domain show the efficiency and the effectiveness of the proposed algorithm.

[1]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[2]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[3]  Thanaruk Theeramunkong,et al.  A new method for finding generalized frequent itemsets in generalized association rule mining , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[4]  M. Kaya,et al.  Mining multi-cross-level fuzzy weighted association rules , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[5]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[6]  Luca Cagliero,et al.  Context-Aware User and Service Profiling by Means of Generalized Association Rules , 2009, KES.

[7]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[8]  Masaru Kitsuregawa,et al.  FP-tax: tree structure based generalized association rule mining , 2004, DMKD '04.

[9]  Elena Baralis,et al.  Characterizing network traffic by means of the NetMine framework , 2009, Comput. Networks.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[12]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[13]  Osmar R. Zaïane,et al.  Application of Data Mining Techniques for Medical Image Classification , 2001, MDM/KDD.