论文信息 - On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices

On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices

Given an m×n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are non-zero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A which are not t-frequent (so called t-infrequent sets). We prove that the inequality α≤(m−t+1)β holds for any binary matrix A in which not all column subsets are t-frequent. This inequality is sharp, and allows for an incremental quasi-polynomial algorithm for generating all minimal t-infrequent sets. We also prove that the analogous generation problem for maximal t-frequent sets is NP-hard. Finally, we discuss the complexity of generating closed frequent sets and some other related problems.

[1] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2] Toshihide Ibaraki,et al. Complexity of Identification and Dualization of Positive Boolean Functions , 1995, Inf. Comput..

[3] Georg Gottlob,et al. Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[4] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5] Heikki Mannila,et al. Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[6] Vladimir Gurvich,et al. Dual-Bounded Generating Problems: Partial and Multiple Transversals of a Hypergraph , 2001, SIAM J. Comput..

[7] Heikki Mannila,et al. Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[8] Mihalis Yannakakis,et al. On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[9] Zvi M. Kedem,et al. Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[10] Brian A. Davey,et al. An Introduction to Lattices and Order , 1989 .

[11] Dimitrios Gunopulos,et al. Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[12] L. Beran,et al. [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[13] Toshihide Ibaraki,et al. Inner-core and Outer-core Functions of Partially Defined Boolean Functions , 1999, Discret. Appl. Math..

[14] Heikki Mannila,et al. Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[15] Nicolas Pasquier,et al. Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[16] Vladimir Gurvich,et al. Generating Partial and Multiple Transversals of a Hypergraph , 2000, ICALP.

[17] Vladimir Gurvich,et al. On Generating the Irredundant Conjunctive and Disjunctive Normal Forms of Monotone Boolean Functions , 1999, Discret. Appl. Math..

[18] György Turán,et al. On frequent sets of Boolean matrices , 1998, Annals of Mathematics and Artificial Intelligence.

[19] AnHai Doan,et al. Geometric foundations for interval-based probabilities , 1998, Annals of Mathematics and Artificial Intelligence.

[20] Rajeev Motwani,et al. Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22] David Eppstein,et al. Arboricity and Bipartite Subgraph Listing Algorithms , 1994, Inf. Process. Lett..

[23] Heikki Mannila,et al. Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[24] Dimitrios Gunopulos,et al. Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS '97.

[25] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[26] Karl Rihaczek,et al. 1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[27] Heikki Mannila,et al. Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[28] Nicolas Pasquier,et al. Closed Set Based Discovery of Small Covers for Association Rules , 1999, Proc. 15èmes Journées Bases de Données Avancées, BDA.

[29] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[30] Leonid Khachiyan,et al. On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[31] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[32] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.