Query preprocessing for integrated search in heterogeneous data sources

SINGAPORE (SINGle Access POint for heterogeneous data REpositories) is a system for querying heterogeneous data. One of its particular features is that new sources may be registered at runtime. For this reason it does not rely on a predefined global integrated schema, but users can integrate data from the underlying sources when querying. Since formulating such queries may be a demanding task, our system allows the formulation of fuzzy queries, which are easier to formulate, at the expense of possibly producing less exact results. As a consequence, input queries need special treatment, called query preprocessing, which generates complex target queries that effectively return the results for the initial user queries. In this paper we discuss the importance of query preprocessing in our system, present heuristics for implementing it and show how techniques from database management systems and information retrieval can be combined in the process of query transformation.