DISCOVER: Keyword Search in Relational Databases

DISCOVER operates on relational databases and facilitates information discovery on them by allowing its user to issue keyword queries without any knowledge of the database schema or of SQL. DISCOVER returns qualified joining networks of tuples, that is, sets of tuples that are associated because they join on their primary and foreign keys and collectively contain all the keywords of the query. DISCOVER proceeds in two steps. First the Candidate Network Generator generates all candidate networks of relations, that is, join expressions that generate the joining networks of tuples. Then the Plan Generator builds plans for the efficient evaluation of the set of candidate networks, exploiting the opportunities to reuse common subexpressions of the candidate networks. We prove that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema. We prove that the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete. We provide a greedy algorithm and we show that it provides near-optimal plan execution time cost. Our experimentation also provides hints on tuning the greedy algorithm.

[1]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[2]  J. Plesník A bound for the Steiner tree problem in graphs , 1981 .

[3]  Kaizhong Zhang,et al.  An approximate search engine for structural databases , 2000, SIGMOD 2000.

[4]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[5]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[6]  Gottfried Vossen,et al.  SISQL: schema-independent database querying (on and off the Web) , 2000, Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789).

[7]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[9]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[10]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Kaizhong Zhang,et al.  An approximate search engine for structural databases , 2000, SIGMOD '00.

[13]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[14]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[16]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[17]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[18]  Jeffrey D. Ullman,et al.  Principles of Database Systems, 2nd Edition , 1982 .

[19]  Gottfried Vossen,et al.  Design and Implementation of a Novel Approach to Keyword Searching in Relational Databases , 2000, ADBIS-DASFAA.