Universal Top-k Keyword Search over Relational Databases

Keyword search is one of the most effective paradigms for information discovery. One of the key advantages of keyword search querying is its simplicity. There is an increasing need for allowing ordinary users to issue keyword queries without any knowledge of the database schema. The retrieval unit of keyword search queries over relational databases is different than in IR systems. While the retrieval unit in those IR systems is a document, in our case, the result is a synthesized document formed by joining a number of tuples. We measure result quality using two metrics: structural quality and content quality. The content quality of a JTT is an IR-style score that indicates how well the information nodes match the keywords, while the structural quality of JTT is a score that evaluates the meaningfulness/semantics of connecting information nodes, for example, the closeness of the corresponding relationship. We design a hybrid approach and develop a buffer system that dynamically maintains a partial data graph in memory. To reuse intermediate results of SQL queries, we break complex SQL queries into two types of simple queries. This allow us to support very large databases and reduce redundant computation. In addition, we conduct extensive experiments on large-scale real datasets to study the performance of the proposed approaches. Experiments show that our approach is better than previous approaches, especially in terms of result quality.