Data management in peer-to-peer data integration systems

Decentralized data management has been addressed during the years by means of several technical solutions, ranging from distributed DBMSs, to mediatorbased data integration systems. Recently, such an issue has been investigated in the context of Peer-to-Peer (P2P) architectures. In this chapter we focus on P2P data integration systems, which are characterized by various autonomous peers, each peer being essentially an autonomous information system that holds data and is linked to other peers by means of P2P mappings. P2P data integration does not rely on the notion of global schema, as in traditional mediator-based data integration. Rather, it computes answers to users’ queries, posed to any peer of the system, on the basis of both local data and the P2P mappings, thus overcoming the main drawbacks of centralized mediator-based data integration systems and providing the foundations of effective data management in virtual organizations. In this chapter we first survey the most significant approaches proposed in the literature for both mediator-based data integration and P2P data management. Then, we focus on advanced schema-based P2P systems for which the aim is semantic integration of data, and analyze the commonly adopted approach of interpreting such systems using a first-order semantics. We show some weaknesses of this approach, and compare it with an alternative approach, based on multi-modal epistemic semantics, which reflects the idea that each peer is conceived as a rational agent that exchanges knowledge/belief with other peers. We consider several central properties of P2P data integration systems: modularity, generality, and decidability. We argue that the approach based on epistemic logic is superior with respect to all the

[1]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[2]  Jan Chomicki,et al.  Computing consistent query answers using conflict hypergraphs , 2004, CIKM '04.

[3]  Andrea Calì,et al.  Data Integration under Integrity Constraints , 2004, CAiSE.

[4]  Diego Calvanese,et al.  Semantic Data Integration in P2P Systems , 2003, DBISP2P.

[5]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[6]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[7]  Divesh Srivastava,et al.  Data model and query evaluation in global information systems , 1995, Journal of Intelligent Information Systems.

[8]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[9]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[10]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[11]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[12]  Diego Calvanese,et al.  DL-Lite: Tractable Description Logics for Ontologies , 2005, AAAI.

[13]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[14]  Phokion G. Kolaitis,et al.  Peer data exchange , 2005, PODS '05.

[15]  Kareem Kamal A. Ghany Data Integration in Data Warehousing , 2012 .

[16]  Diego Calvanese,et al.  View-Based Query Processing: On the Relationship Between Rewriting, Answering and Losslessness , 2005, ICDT.

[17]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[18]  Ronald Fagin,et al.  Inclusion dependencies and their interaction with functional dependencies , 1982, PODS.

[19]  Andrea Calì,et al.  IBIS: Semantic Data Integration at Work , 2003, CAiSE.

[20]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[21]  Mike P. Papazoglou,et al.  Leveraging Web-Services and Peer-to-Peer Networks , 2003, CAiSE.

[22]  Diego Calvanese,et al.  Description Logic Framework for Information Integration , 1998, KR.

[23]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[24]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[25]  AbererKarl,et al.  Improving Data Access in P2P Systems , 2002 .

[26]  Moshe Y. Vardi,et al.  The Implication Problem for Functional and Inclusion Dependencies is Undecidable , 1985, SIAM J. Comput..

[27]  Alberto O. Mendelzon,et al.  Authorization Views and Conditional Query Containment , 2005, ICDT.

[28]  Renée J. Miller,et al.  First-order query rewriting for inconsistent databases , 2005, J. Comput. Syst. Sci..

[29]  Jarek Gryz,et al.  Query Rewriting Using Views in the Presence of Functional and Inclusion Dependencies , 1999, Inf. Syst..

[30]  Andrea Calì,et al.  Query rewriting and answering under constraints in data integration systems , 2003, IJCAI.

[31]  Luciano Serafini,et al.  Using Wrapper Agents to Answer Queries in Distributed Information Systems , 2000, ADVIS.

[32]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[33]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[34]  Jianwen Su,et al.  E-services: a look behind the curtain , 2003, PODS.

[35]  Steve R. Waterhouse,et al.  Distributed Search in P2P Networks , 2002, IEEE Internet Comput..

[36]  Alberto O. Mendelzon,et al.  Tableau Techniques for Querying Information Sources through Global Schemas , 1999, ICDT.

[37]  Christoph Koch,et al.  Query rewriting with symmetric constraints , 2002, AI Commun..

[38]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[39]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[40]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[41]  John C. Mitchell The Implication Problem for Functional and Inclusion Dependencies , 1984, Inf. Control..

[42]  Diego Calvanese,et al.  Decidable containment of recursive queries , 2003, Theor. Comput. Sci..

[43]  Diego Calvanese,et al.  What to Ask to a Peer: Ontolgoy-based Query Reformulation , 2004, KR.

[44]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[45]  Gabriel M. Kuper,et al.  A Robust Logical and Computational Characterisation of Peer-to-Peer Database Systems , 2003, DBISP2P.

[46]  Leopoldo E. Bertossi,et al.  Deductive databases for computing certain and consistent answers from mediated data integration systems , 2005, J. Appl. Log..

[47]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[48]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[49]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[50]  Leopoldo E. Bertossi,et al.  Logic Programs for Consistently Querying Data Integration Systems , 2003, IJCAI.

[51]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[52]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[53]  Maurizio Lenzerini,et al.  Introduction to the special issue on data extraction, cleaning, and reconciliation , 2001, Inf. Syst..

[54]  Maurizio Vincini,et al.  Querying a Super-Peer in a Schema-Based Super-Peer Network , 2005, DBISP2P.

[55]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[56]  Ronald Fagin,et al.  Reasoning about knowledge , 1995 .

[57]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[58]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[59]  Gerhard Lakemeyer,et al.  The logic of knowledge bases , 2000 .

[60]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..

[61]  Riccardo Rosati,et al.  Consistent query answering under key and exclusion dependencies: algorithms and experiments , 2005, CIKM '05.

[62]  Maurizio Lenzerini,et al.  Representing and Using Interschema Knowledge in Cooperative Information Systems , 1993, Int. J. Cooperative Inf. Syst..

[63]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[64]  Karl Aberer,et al.  Improving Data Access in P2P Systems , 2002, IEEE Internet Comput..

[65]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[66]  James A. Hendler,et al.  A Portrait of the Semantic Web in Action , 2001, IEEE Intell. Syst..

[67]  Wolfgang Faber,et al.  The INFOMIX system for advanced integration of incomplete and inconsistent data , 2005, SIGMOD '05.

[68]  Maurizio Lenzerini,et al.  Editorial: Introduction to: Data extraction, cleaning, and reconciliation a special issue of information systems, an international journal , 2001 .

[69]  Andrea Calì,et al.  Experimenting Data Integration with DIS@DIS , 2004, CAiSE.

[70]  Jie Zhao,et al.  Schema Mediation in Peer Data Management Systems , 2011, Int. J. Cooperative Inf. Syst..