SINGAPORE (SINGle Access POint for heterogeneous data REpositories) is a system for querying heterogeneous data. One of its particular features is that new sources may be registered at runtime. For this reason it does not rely on a predefined global integrated schema, but users can integrate data from the underlying sources when querying. Since formulating such queries may be a demanding task, our system allows the formulation of fuzzy queries, which are easier to formulate, at the expense of possibly producing less exact results. As a consequence, input queries need special treatment, called query preprocessing, which generates complex target queries that effectively return the results for the initial user queries. In this paper we discuss the importance of query preprocessing in our system, present heuristics for implementing it and show how techniques from database management systems and information retrieval can be combined in the process of query transformation.
[1]
Oren Etzioni,et al.
Multi-Engine Search and Comparison Using the MetaCrawler
,
1995,
World Wide Web J..
[2]
Laura M. Haas,et al.
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System
,
1999,
VLDB.
[3]
Calton Pu,et al.
An adaptive approach to query mediation across heterogeneous information sources
,
1996,
Proceedings First IFCIS International Conference on Cooperative Information Systems.
[4]
M. F. Porter,et al.
An algorithm for suffix stripping
,
1997
.
[5]
Klaus R. Dittrich,et al.
A query based approach for integrating heterogeneous data sources
,
2000,
CIKM '00.
[6]
Patrick Valduriez,et al.
Scaling Access to Heterogeneous Data Sources with DISCO
,
1998,
IEEE Trans. Knowl. Data Eng..