Condensed representation of frequent itemsets

One of the major problems in pattern mining is still the problem of pattern explosion, i.e., the large amounts of patterns produced by the mining algorithms when analyzing a database with a predefined minimum support threshold. The approach we take to overcome this problem aims for automatically inferring variables from the patterns found, in order to generalize those patterns by representing them in a compact way. We introduce the novel concept of meta-patterns and present the RECAP algorithm. Meta-patterns can take several forms and the sets of patterns can be grouped considering different criteria. These decisions come as a trade-off between expressiveness and compaction of the patterns. The proposed solution accomplishes good results in the tested dataset, reducing to less than half the amount of patterns found.

[1]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[2]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[3]  Mohamed Medhat Gaber,et al.  Journeys to Data Mining , 2012, Springer Berlin Heidelberg.

[4]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[5]  Saso Dzeroski,et al.  Inductive Logic Programming and Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Luc De Raedt,et al.  CLASSIC'CL: An Integrated ILP System , 2005, Discovery Science.

[7]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[8]  Jean-François Boulicaut,et al.  Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[9]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[10]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[11]  Mohammed J. Zaki A Journey in Pattern Mining , 2012, Journeys to Data Mining.

[12]  Christophe Rigotti,et al.  A condensed representation to find frequent patterns , 2001, PODS '01.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Usama M. Fayyad,et al.  Data Mining and Knowledge Discovery in Databases: Applications in Astronomy and Planetary Science , 1996, AAAI/IAAI, Vol. 2.

[15]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[16]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[19]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[20]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[22]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[26]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Taisuke Sato,et al.  RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising , 2012, SDM.

[28]  Cláudia Antunes,et al.  Finding Periodic Regularities on Sequential Data: Converging, Diverging and Cyclic Patterns , 2014, C3S2E.

[29]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[30]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[31]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[32]  Tapio Elomaa,et al.  Principles of Data Mining and Knowledge Discovery , 2002, Lecture Notes in Computer Science.

[33]  Hongjun Lu,et al.  On computing, storing and querying frequent patterns , 2003, KDD '03.

[34]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[35]  Vítor Santos Costa,et al.  Inductive Logic Programming , 2013, Lecture Notes in Computer Science.

[36]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[37]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.