A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

In this paper we face the problem of searching for rare itemsets. A main issue regards the strategy to adopt in exploring the power set lattice. Assuming a power set lattice with full set at the top and empty set at the bottom, the most of the algorithms adopt a bottom-up exploration, i.e. moving from smaller to larger sets. Although this approach is advantageous in the case of frequent itemsets, it might not be worth being used for rare itemsets, as they occur on the top of the lattice. We propose Rarity, a top-down breadth-first level-wise algorithm. Experimental results and comparisons are illustrated in order to provide a quantitative characterization of algorithm performances and complexity. Application to some UCI benchmark and real world datasets is provided. An algorithm parallelization is outlined. Experiments showed that this approach takes advantage of finding all rare non-zero itemsets in less time than other solutions, at expenses of higher memory demand.

[1]  StummeGerd,et al.  Mining frequent patterns with counting inference , 2000 .

[2]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[4]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[5]  Amedeo Napoli,et al.  Towards Rare Itemset Mining , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[8]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[9]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[10]  Nicolas Pasquier,et al.  Closed Set Based Discovery of Small Covers for Association Rules , 1999, Proc. 15èmes Journées Bases de Données Avancées, BDA.

[11]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[12]  Keun Ho Ryu,et al.  Mining association rules on significant rare data using relative support , 2003, J. Syst. Softw..

[13]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[14]  Shamkant B. Navathe,et al.  Text Mining and Ontology Applications in Bioinformatics and GIS , 2007, International Conference on Machine Learning and Applications.

[15]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[18]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[19]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[20]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[21]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[22]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[23]  Sanguthevar Rajasekaran,et al.  A transaction mapping algorithm for frequent itemsets mining , 2006 .

[24]  Amedeo Napoli,et al.  ZART: A Multifunctional Itemset Mining Algorithm , 2007, CLA.

[25]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[26]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[27]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[28]  Lei Wu,et al.  Rare Itemset Mining , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[29]  Gillian Dobbie,et al.  RP-Tree: Rare Pattern Tree Mining , 2011, DaWaK.

[30]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[31]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[32]  Heikki Mannila,et al.  Fast Discovery of Association Rules in Large Databases , 1996, Knowledge Discovery and Data Mining.

[33]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[34]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[35]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[36]  Luigi Troiano,et al.  A Fast Algorithm for Mining Rare Itemsets , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[37]  Yun Sing Koh,et al.  Finding Sporadic Rules Using Apriori-Inverse , 2005, PAKDD.

[38]  Anna M. Manning,et al.  On Minimal Infrequent Itemset Mining , 2007, DMIN.

[39]  Tai-Wen Yue,et al.  A Q'tron Neural-Network Approach to Solve the Graph Coloring Problems , 2007 .

[40]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[41]  Yun Sing Koh,et al.  Mining Interesting Imperfectly Sporadic Rules , 2006, PAKDD.

[42]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .