Frequent Itemset Border Approximation by Dualization

The approach FIBAD is introduced with the purpose of computing approximate borders of frequent itemsets by leveraging dualization and computation of approximate minimal transversals of hypergraphs. The distinctiveness of the FIBAD's theoretical foundations is the approximate dualization where a new function $$\widetilde{f}$$ is defined to compute the approximate negative border. From a methodological point of view, the function $$\widetilde{f}$$ is implemented by the method AMTHR that consists of a reduction of the hypergraph and a computation of its minimal transversals. For evaluation purposes, we study the sensibility of FIBAD to AMTHR by replacing this latter by two other algorithms that compute approximate minimal transversals. We also compare our approximate dualization-based method with an existing approach that computes directly, without dualization, the approximate borders. The experimental results show that our method outperforms the other methods as it produces borders that have the highest quality.

[1]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[2]  Bruno Crémilleux,et al.  A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation , 2007, Fundam. Informaticae.

[3]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[4]  S. W. Song,et al.  A parallel approximation hitting set algorithm for gene expression analysis , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[5]  Mario Boley On Approximating Minimum Infrequent and Maximum Frequent Sets , 2007, Discovery Science.

[6]  Mohammad Al Hasan,et al.  MUSK: Uniform Sampling of k Maximal Patterns , 2009, SDM.

[7]  Alain Bretto,et al.  A reductive approach to hypergraph clustering: An application to image segmentation , 2012, Pattern Recognit..

[8]  Jean-Marc Petit,et al.  ABS: Adaptive Borders Search of frequent itemsets , 2004, FIMI.

[9]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[10]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[11]  Bart Goethals,et al.  Randomly sampling maximal itemsets , 2013, IDEA@KDD.

[12]  Nicolas Durand,et al.  ECCLAT: a New Approach of Clusters Discovery in Categorical Data , 2003 .

[13]  J. Bailey,et al.  Efficient Mining of Contrast Patterns and Their Applications to Classification , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[14]  Elias C. Stavropoulos,et al.  Journal of Graph Algorithms and Applications an Efficient Algorithm for the Transversal Hypergraph Generation , 2022 .

[15]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[16]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[17]  Takeaki Uno,et al.  Efficient algorithms for dualizing large-scale hypergraphs , 2011, Discret. Appl. Math..

[18]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[19]  Takeaki Uno,et al.  Enumerating Maximal Frequent Sets Using Irredundant Dualization , 2003, Discovery Science.

[20]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[21]  Jean-Marc Petit,et al.  Zigzag: a new algorithm for mining large inclusion dependencies in databases , 2003, Third IEEE International Conference on Data Mining.

[22]  Philip S. Yu,et al.  Mining Colossal Frequent Patterns by Core Pattern Fusion , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[26]  Georg Gottlob,et al.  Hypergraph Transversal Computation and Related Problems in Logic and AI , 2002, JELIA.

[27]  Nicolas Durand,et al.  Approximation of Frequent Itemset Border by Computing Approximate Minimal Hypergraph Transversals , 2014, DaWaK.

[28]  James Bailey,et al.  A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns , 2003, Third IEEE International Conference on Data Mining.

[29]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[30]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[31]  M. Karonski,et al.  On Marczewski-Steinhaus type distance between hypergraphs , 1977 .

[32]  Staal A. Vinterbo,et al.  Minimal approximate hitting sets and rule templates , 2000, Int. J. Approx. Reason..

[33]  Bruno Crémilleux,et al.  Nonredundant Generalized Rules and Their Impact in Classification , 2010, Advances in Intelligent Information Systems.

[34]  Jean-Marc Petit,et al.  A new classification of datasets for frequent itemsets , 2008, Journal of Intelligent Information Systems.

[35]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[36]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.

[37]  Yang Xiang,et al.  Cartesian contour: a concise representation for a collection of frequent sets , 2009, KDD.

[38]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[39]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[40]  Rui Abreu,et al.  A Low-Cost Approximate Minimal Hitting Set Algorithm and its Application to Model-Based Diagnosis , 2009, SARA.

[41]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.