Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might store data of poor quality, the mapping to the PDMS might be incorrect, and the mapping to the PDMS might be incomplete. To compensate, we propose a quality model to measure this effect, a cost model to restrict query planning to the best paths through the PDMS, and techniques to answer queries in such Webscale PDMS efficiently. 1 An Ever-growing PDMS The step from centralized database systems (DBMS) to distributed and then to federated database systems (FDBMS) removed the assumption that data must be located at the same site as the query. A federated database provides a global schema that represents the data it can access locally and remotely. The global schema is related to the local schemata via schema mappings, which specify how the schema of a local database maps to the global schema. The federated database accepts a query against its global schema and distributes it according to the schema mappings to the different sites where the data resides. Those sites execute the partial queries and send results back to the requesting peer. Again, the schema mappings specify how data is to be translated to conform to the global schema. The results are further processed and combined to be finally fused into a single response to the user. A natural extension to this paradigm is to remove the assumption that queries are only asked against a single integrating site. Peer data management systems (PDMS) are built of multiple peers, each of which provides a schema and accepts queries against the schema. Again, the peers are connected by mappings among their schemata. However, instead of forming a tree with a single root, each peer can be connected to any number of other peers. Queries against a schema of one peer can be answered using the data of the entire PDMS, as long as appropriate mappings have been formed (see Fig. 1). In general, a query
[1]
Felix Naumann,et al.
Completeness of integrated information sources
,
2004,
Inf. Syst..
[2]
Pedro M. Domingos,et al.
Learning to map between structured representations of data
,
2002
.
[3]
Stefan Conrad,et al.
Statistical Analysis as Methodological Framework for Data(base) Integration
,
2003,
ER.
[4]
Wolfgang Nejdl,et al.
Information Integration in Schema-Based Peer-To-Peer Networks
,
2003,
CAiSE.
[5]
Karl Aberer,et al.
The chatty web: emergent semantics through gossiping
,
2003,
WWW '03.
[6]
Pedro M. Domingos,et al.
Representing and reasoning about mappings between domain models
,
2002,
AAAI/IAAI.
[7]
Vipul Kashyap,et al.
Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing
,
2000,
Int. J. Cooperative Inf. Syst..
[8]
Fausto Giunchiglia,et al.
Data Management for Peer-to-Peer Computing : A Vision
,
2002,
WebDB.
[9]
Alon Y. Halevy,et al.
Answering queries using views: A survey
,
2001,
The VLDB Journal.
[10]
Erhard Rahm,et al.
Generic Schema Matching with Cupid
,
2001,
VLDB.
[11]
Diego Calvanese,et al.
Logical foundations of peer-to-peer data integration
,
2004,
PODS '04.
[12]
Jayant Madhavan,et al.
Composing Mappings Among Data Sources
,
2003,
VLDB.
[13]
Diane M. Strong,et al.
Beyond Accuracy: What Data Quality Means to Data Consumers
,
1996,
J. Manag. Inf. Syst..
[14]
Alon Y. Halevy,et al.
Efficient query reformulation in peer data management systems
,
2004,
SIGMOD '04.
[15]
Robert Tolksdorf,et al.
The Impact of Semantic Web Technologies on Job Recruitment Processes
,
2005,
Wirtschaftsinformatik.
[16]
Y HalevyAlon.
Answering queries using views: A survey
,
2001,
VLDB 2001.
[17]
Erhard Rahm,et al.
A survey of approaches to automatic schema matching
,
2001,
The VLDB Journal.
[18]
Gio Wiederhold,et al.
Mediators in the architecture of future information systems
,
1992,
Computer.
[19]
Felix Naumann,et al.
Quality-driven Integration of Heterogenous Information Systems
,
1999,
VLDB.