Generating dual-bounded hypergraphs

This article surveys some recent results on the generation of implicitly given hypergraphs and their applications in Boolean and integer programming, data mining, reliability theory, and combinatorics. Given a monotone property ~ over the subsets of a finite set V, we consider the problem of incrementally generating the family F π of all minimal subsets satisfying property ~ , when ~ is given by a polynomial-time satisfiability oracle. For a number of interesting monotone properties, the family F π turns out to be uniformly dual-bounded , allowing for the incrementally efficient enumeration of the members of F π. Important applications include the efficient generation of minimal infrequent sets of a database (data mining), minimal connectivity ensuring collections of subgraphs from a given list (reliability theory), minimal feasible solutions to a system of monotone inequalities in integer variables (integer programming), minimal spanning collections of subspaces from a given list (linear algebra) and maximal independent sets in the intersection of matroids (combinatorial optimization). In contrast to these results, the analogous problem of generating the family of all maximal subsets not having property ~ is NP-hard for almost all applications mentioned above.

[1]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[2]  Boros Endre,et al.  Generating Weighted Transversals of a Hypergraph , 2000 .

[3]  Charles J. Colbourn,et al.  The Combinatorics of Network Reliability , 1987 .

[4]  Fatih Yilmaz,et al.  An ordering (enumerative) algorithm for nonlinear 0-1 programming , 1994, J. Glob. Optim..

[5]  Leonid Khachiyan Transversal Hypergraphs and Families of Polyhedral Cones , 2001 .

[6]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[7]  Yves Crama,et al.  Dualization of regular Boolean functions , 1987, Discret. Appl. Math..

[8]  Toshihide Ibaraki,et al.  Inner-core and Outer-core Functions of Partially Defined Boolean Functions , 1999, Discret. Appl. Math..

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  C. McDiarmid Rado's theorem for polymatroids , 1975, Mathematical Proceedings of the Cambridge Philosophical Society.

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Nicolas Pasquier,et al.  Closed Set Based Discovery of Small Covers for Association Rules , 1999, Proc. 15èmes Journées Bases de Données Avancées, BDA.

[13]  R. Read Every one a Winner or how to Avoid Isomorphism Search when Cataloguing Combinatorial Configurations , 1978 .

[14]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[15]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[16]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[17]  György Turán,et al.  On frequent sets of Boolean matrices , 1998, Annals of Mathematics and Artificial Intelligence.

[18]  Shuji Tsukiyama,et al.  A New Algorithm for Generating All the Maximal Independent Sets , 1977, SIAM J. Comput..

[19]  Endre Boros,et al.  Predicting Cause-Effect Relationships from Incomplete Discrete Observations , 1994, SIAM J. Discret. Math..

[20]  Peter L. Hammer,et al.  On the Role of Generalized Covering Problems. , 1972 .

[21]  Toshihide Ibaraki,et al.  Polynomial-Time Recognition of 2-Monotonic Positive Boolean Functions Given by an Oracle , 1997, SIAM J. Comput..

[22]  AnHai Doan,et al.  Geometric foundations for interval-based probabilities , 1998, Annals of Mathematics and Artificial Intelligence.

[23]  V. Gurvich The solvability of positional games in pure strategies , 1975 .

[24]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[25]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[26]  Toshihide Ibaraki,et al.  The Maximum Latency and Identification of Positive Boolean Functions , 1994, ISAAC.

[27]  Toshihide Ibaraki,et al.  Complexity of Identification and Dualization of Positive Boolean Functions , 1995, Inf. Comput..

[28]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[29]  Georg Gottlob,et al.  New results on monotone dualization and generating hypergraph transversals , 2002, STOC '02.

[30]  Panos M. Pardalos,et al.  Advances in Convex Analysis and Global Optimization , 2001 .

[31]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[32]  Antonio Sassano,et al.  An O(mn) Algorithm for Regular Set-Covering Problems , 1987, Theor. Comput. Sci..

[33]  Vladimir Gurvich,et al.  Matroid Intersections, Polymatroid Inequalities, and Related Problems , 2002, MFCS.

[34]  K. Ramamurthy Coherent Structures and Simple Games , 1990 .

[35]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[36]  Vladimir Gurvich,et al.  On Generating All Minimal Integer Solutions for a Monotone System of Linear Inequalities , 2001, ICALP.

[37]  Zhou Wen,et al.  Efficient mining of emerging patterns , 2002 .

[38]  Bruno Simeone,et al.  A O(nm)-Time Algorithm for Computing the Dual of a Regular Boolean Function , 1994, Discret. Appl. Math..

[39]  Toshihide Ibaraki,et al.  Interior and Exterior Functions of Boolean Functions , 1996, Discret. Appl. Math..

[40]  Khaled M. Elbassioni On Dualization in Products of Forests , 2002, STACS.

[41]  Vladimir Gurvich,et al.  On the frequency of the most frequently occurring variable in dual monotone DNFs , 1997, Discret. Math..

[42]  Hisao Tamaki,et al.  Space-efficient enumeration of minimal transversals of a hypergraph , 2000 .

[43]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[44]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[45]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[46]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[47]  Bruno Simeone,et al.  Polynomial-time algorithms for regular set-covering and threshold synthesis , 1985, Discret. Appl. Math..

[48]  Vladimir Gurvich,et al.  Generating Partial and Multiple Transversals of a Hypergraph , 2000, ICALP.

[49]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[50]  Vladimir Gurvich,et al.  On Generating the Irredundant Conjunctive and Disjunctive Normal Forms of Monotone Boolean Functions , 1999, Discret. Appl. Math..

[51]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[52]  Leonard Pitt,et al.  Efficient Read-Restricted Monotone CNF/DNF Dualization by Learning with Membership Queries , 1999, Machine Learning.

[53]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[54]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[55]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[56]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[57]  Vladimir Gurvich,et al.  An Efficient Incremental Algorithm for Generating All Maximal Independent Sets in Hypergraphs of Bounded Dimension , 2000, Parallel Process. Lett..

[58]  Kazuhisa Makino Efficient dualization of O(log n)-term monotone disjunctive normal forms , 2003, Discret. Appl. Math..

[59]  Vladimir Gurvich,et al.  Dual-Bounded Generating Problems: All Minimal Integer Solutions for a Monotone System of Linear Inequalities , 2002, SIAM J. Comput..

[60]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[61]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[62]  Vladimir Gurvich,et al.  On theory of multistep games , 1973 .

[63]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[64]  T. Helgason Aspects of the theory of hypermatroids , 1974 .

[65]  Vladimir Gurvich,et al.  Dual-Bounded Generating Problems: Partial and Multiple Transversals of a Hypergraph , 2001, SIAM J. Comput..

[66]  Eugene L. Lawler,et al.  Generating all Maximal Independent Sets: NP-Hardness and Polynomial-Time Algorithms , 1980, SIAM J. Comput..

[67]  Toshihide Ibaraki,et al.  Minimum Self-dual Decompositions of Positive Dual-minor Boolean Functions , 1997, Discret. Appl. Math..

[68]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[69]  Vladimir Gurvich,et al.  An inequality for polymatroid functions and its applications , 2003, Discret. Appl. Math..

[70]  P. Hammer,et al.  Dual subimplicants of positive Boolean functions , 1998 .

[71]  Robert E. Tarjan,et al.  Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees , 1975, Networks.

[72]  James G. Oxley,et al.  Matroid theory , 1992 .

[73]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[74]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[75]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[76]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[77]  Khaled M. Elbassioni An Algorithm for Dualization in Products of Lattices and Its Applications , 2002, ESA.

[78]  Toshihide Ibaraki,et al.  A Fast and Simple Algorithm for Identifying 2-Monotonic Positive Boolean Functions , 1998, J. Algorithms.