Misleading Generalized Itemset discovery

Frequent generalized itemset mining is a data mining technique utilized to discover a high-level view of interesting knowledge hidden in the analyzed data. By exploiting a taxonomy, patterns are usually extracted at any level of abstraction. However, some misleading high-level patterns could be included in the mined set. This paper proposes a novel generalized itemset type, namely the Misleading Generalized Itemset (MGI). Each MGI, denoted as X@?E, represents a frequent generalized itemset X and its set E of low-level frequent descendants for which the correlation type is in contrast to the one of X. To allow experts to analyze the misleading high-level data correlations separately and exploit such knowledge by making different decisions, MGIs are extracted only if the low-level descendant itemsets that represent contrasting correlations cover almost the same portion of data as the high-level (misleading) ancestor. An algorithm to mine MGIs at the top of traditional generalized itemsets is also proposed. The experiments performed on both real and synthetic datasets demonstrate the effectiveness and efficiency of the proposed approach.

[1]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[2]  Thanaruk Theeramunkong,et al.  A new method for finding generalized frequent itemsets in generalized association rule mining , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[3]  Philippe Lenca,et al.  Mining top-k regular-frequent itemsets using database partitioning and support estimation , 2012, Expert Syst. Appl..

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Thanaruk Theeramunkong,et al.  Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules , 2004, IEICE Trans. Inf. Syst..

[6]  Donghui Zhang,et al.  Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy , 2008, Journal of Computer Science and Technology.

[7]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[8]  Tarek F. Gharib An efficient algorithm for mining frequent maximal and closed itemsets , 2009, Int. J. Hybrid Intell. Syst..

[9]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Yi-Ping Phoebe Chen,et al.  Association rule mining to detect factors which contribute to heart disease in males and females , 2013, Expert Syst. Appl..

[12]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[13]  Songqing Chen,et al.  Analyzing patterns of user content generation in online social networks , 2009, KDD.

[14]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[15]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[16]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[17]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[18]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[19]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[20]  Nikolaj Tatti,et al.  Probably the best itemsets , 2010, KDD.

[21]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[22]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[23]  Masaru Kitsuregawa,et al.  FP-tax: tree structure based generalized association rule mining , 2004, DMKD '04.

[24]  Sangkyum Kim,et al.  Mining Flipping Correlations from Large Datasets with Taxonomies , 2011, Proc. VLDB Endow..

[25]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[26]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[27]  Luca Cagliero,et al.  Support driven opportunistic aggregation for generalized itemset extraction , 2010, 2010 5th IEEE International Conference Intelligent Systems.

[28]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[29]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[32]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[33]  Jaideep Srivastava,et al.  Indirect Association: Mining Higher Order Dependencies in Data , 2000, PKDD.

[34]  Yin-Fu Huang,et al.  Generalized association rule mining using an efficient data structure , 2011, Expert Syst. Appl..

[35]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.