Unstructured search on structured databases
暂无分享,去创建一个
Structured query languages such as SQL and XQUERY have been extensively studied in the database area. However, this model of structured search requires users to have prior knowledge of the database schemas and the query language, which places a heavy burden on ordinary users. To help them find information, we adopt the mode of unstructured search---users can just write natural language queries or a set of keywords to express their information needs without knowing the exact database schemas and using a structured query language. However, unstructured search model is primarily for text databases. It is non-trivial to apply it into structured databases.
We identify two problems in unstructured search on structured databases. (1) Within an organization with a set of heterogeneous relational databases, we create a mediator on top of them. The mediator provides a natural language interface to ordinary users. (2) For a relational database, given a keyword query, generate answers (joining trees of tuples) with ranks: the higher an answer is ranked; the more relevant it is supposed to be with the query. This dissertation is the first work that gives a detailed description of an intranet mediator with a natural language interface for different applications. We present a general approach that selects the proper databases to answer given NLQs; we propose a graph-matching algorithm to match queries and databases; and experimental results are given to demonstrate the effectiveness of our solution. We are the first to conduct comprehensive experiments for the effectiveness problem for unstructured search in relational databases. We identify four new factors that are critical to the problem of search effectiveness in relational databases. We propose a novel ranking strategy to solve the effectiveness problem. Answers are returned with basic semantics. Experimental results show that our strategy is significantly better than existing works in effectiveness (77.4% better than related works and 16.3% better than Google).