Data mining, hypergraph transversals, and machine learning (extended abstract)

Several data mining problems can be formulated as problems of nding maximally speci c sentences that are interesting in a database. We rst show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in data mining, proving upper bounds on their complexity. The rst algorithm is useful when the maximally speci c interesting sentences are \small". We show that this algorithm can also be used to e ciently solve a special case of the hypergraph transversal problem, improving on previous results. The second algorithm utilizes a subroutine for hypergraph transversals, and is applicable in more general situations, with complexity close to a lower bound for the problem. We also relate these problems to the model of exact learning in computational learning theory, and use the correspondence to derive some corollaries.

[1]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[2]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[3]  I. Anderson Combinatorics of Finite Sets , 1987 .

[4]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[5]  Hunter S. Snevily Combinatorics of finite sets , 1991 .

[6]  Jorg-uwe Kietz,et al.  Controlling the Complexity of Learning in Logic through Syntactic and Task-Oriented Models , 1992 .

[7]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[8]  Luc De Raedt,et al.  A Theory of Clausal Discovery , 1993, IJCAI.

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[11]  Luc De Raedt,et al.  First-Order jk-Clausal Theories are PAC-Learnable , 1994, Artif. Intell..

[12]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[13]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[14]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[15]  Sampath Kannan,et al.  Oracles and Queries That Are Sufficient for Exact Learning , 1996, J. Comput. Syst. Sci..

[16]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[17]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[18]  Roni Khardon Translating between Horn Representations and their Characteristic Models , 1995, J. Artif. Intell. Res..

[19]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[20]  Heikki Mannila,et al.  On an algorithm for finding all interesting sentences , 1996 .

[21]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[22]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[23]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.