Optimization of constrained frequent set queries with 2-variable constraints

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints. While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, “conventional” monotonicity-based optimization techniques do not apply effectively to 2-var constraints. The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.

[1]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[2]  Yasuhiko Morimoto,et al.  Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization , 1996, SIGMOD '96.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[5]  T. J. Watson,et al.  An E ective Hash-Based Algorithm for Mining Association RulesJong , 1995 .

[6]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[7]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[8]  David B. Lomet,et al.  Bulletin of the Technical Committee on Data Engineering Special Issue on Data Reduction Techniques Announcements and Notices Letter from the Editor-in-chief 1 Technical Committee Election Changing Editorial Staa Letter from the Special Issue Editor the New Jersey Data Reduction Report , 2022 .

[9]  Christos Faloutsos,et al.  Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining , 1998, VLDB.

[10]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[11]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[12]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[13]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[14]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[15]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Surajit Chaudhuri Data Mining and Database Systems: Where is the Intersection? , 1998, IEEE Data Eng. Bull..

[20]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[21]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[22]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[23]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[24]  Abraham Silberschatz,et al.  Database systems—breaking out of the box , 1997, SGMD.

[25]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[26]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.