On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices

Given an m×n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are non-zero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A which are not t-frequent (so called t-infrequent sets). We prove that the inequality α≤(m−t+1)β holds for any binary matrix A in which not all column subsets are t-frequent. This inequality is sharp, and allows for an incremental quasi-polynomial algorithm for generating all minimal t-infrequent sets. We also prove that the analogous generation problem for maximal t-frequent sets is NP-hard. Finally, we discuss the complexity of generating closed frequent sets and some other related problems.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Toshihide Ibaraki,et al.  Complexity of Identification and Dualization of Positive Boolean Functions , 1995, Inf. Comput..

[3]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[6]  Vladimir Gurvich,et al.  Dual-Bounded Generating Problems: Partial and Multiple Transversals of a Hypergraph , 2001, SIAM J. Comput..

[7]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[8]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[9]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[10]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[11]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[12]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[13]  Toshihide Ibaraki,et al.  Inner-core and Outer-core Functions of Partially Defined Boolean Functions , 1999, Discret. Appl. Math..

[14]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[16]  Vladimir Gurvich,et al.  Generating Partial and Multiple Transversals of a Hypergraph , 2000, ICALP.

[17]  Vladimir Gurvich,et al.  On Generating the Irredundant Conjunctive and Disjunctive Normal Forms of Monotone Boolean Functions , 1999, Discret. Appl. Math..

[18]  György Turán,et al.  On frequent sets of Boolean matrices , 1998, Annals of Mathematics and Artificial Intelligence.

[19]  AnHai Doan,et al.  Geometric foundations for interval-based probabilities , 1998, Annals of Mathematics and Artificial Intelligence.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  David Eppstein,et al.  Arboricity and Bipartite Subgraph Listing Algorithms , 1994, Inf. Process. Lett..

[23]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[24]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS '97.

[25]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[26]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[27]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[28]  Nicolas Pasquier,et al.  Closed Set Based Discovery of Small Covers for Association Rules , 1999, Proc. 15èmes Journées Bases de Données Avancées, BDA.

[29]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[30]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[31]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[32]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.