Complex ad hoc join queries over enterprise databases are commonly used by business data analysts to understand and analyze a variety of enterprise-wide processes. However, effectively formulating such queries is a challenging task for human users, especially over databases that have large, heterogeneous schemas. In this paper, we propose a novel approach to automatically create join query recommendations based on input-output specifications (i.e.,input tables on which selection conditions are imposed, and output tables whose attribute values must be in the result of the query).The recommended join query graph includes (i) "intermediate'' tables, and (ii) join conditions that connect the input and output tables via the intermediate tables. Our method is based on analyzing an existing query log over the enterprise database. Borrowing from program slicing techniques, which extract parts of a program that affect the value of a given variable, we first extract "query slices'' from each query in the log. Given a user specification, we then re-combine appropriate slices to create a new join query graph, which connects the sets of input and output tables via the intermediate tables. We propose and study several quality measures to enable choosing a good join query graph among the many possibilities. Each measure expresses an intuitive notion that there should be sufficient evidence in the log to support our recommendation of the join query graph. We conduct an extensive study using the log of an actual enterprise database system to demonstrate the viability of our novel approach for recommending join queries.
[1]
G. A. Venkatesh,et al.
The semantic approach to program slicing
,
1991,
PLDI '91.
[2]
Shui-Lung Chuang,et al.
Enriching Web taxonomies through subject categorization of query terms from search engine logs
,
2003,
Decis. Support Syst..
[3]
Yannis E. Ioannidis,et al.
Incomplete path expressions and their disambiguation
,
1994,
SIGMOD '94.
[4]
Ricardo A. Baeza-Yates,et al.
Extracting semantic relations from query logs
,
2007,
KDD '07.
[5]
Theodore Johnson,et al.
Mining database structure; or, how to build a data quality browser
,
2002,
SIGMOD '02.
[6]
S. Sudarshan,et al.
Keyword searching and browsing in databases using BANKS
,
2002,
Proceedings 18th International Conference on Data Engineering.
[7]
Clement T. Yu,et al.
Effective keyword search in relational databases
,
2006,
SIGMOD Conference.
[8]
David W. Binkley,et al.
Interprocedural slicing using dependence graphs
,
1990,
TOPL.
[9]
Stefan Wrobel,et al.
An Algorithm for Multi-relational Discovery of Subgroups
,
1997,
PKDD.
[10]
David W. Binkley,et al.
Program slicing
,
2008,
2008 Frontiers of Software Maintenance.