GrouPeer: Dynamic clustering of P2P databases

Sharing structured data in a P2P network is a challenging problem, especially in the absence of a mediated schema. The standard practice of answering a consecutively rewritten query along the propagation path often results in significant loss of information. On the opposite, the use of mediated schemas requires human interaction and global agreement, both during creation and maintenance. In this paper we present GrouPeer, an adaptive, automated approach to both issues in the context of unstructured P2P database overlays. By allowing peers to individually choose which rewritten version of a query to answer and evaluate the received answers, information-rich sources left hidden otherwise are discovered. Gradually, the overlay is restructured as semantically similar peers are clustered together. Experimental results show that our technique produces very accurate answers and builds clusters that are very close to the optimal ones by contacting a very small number of nodes in the overlay.

[1]  Anne-Marie Kermarrec,et al.  Exploiting semantic clustering in the eDonkey P2P network , 2004, EW 11.

[2]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[3]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[6]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[7]  Richard Hull,et al.  Relative information capacity of simple relational database schemata , 1984, SIAM J. Comput..

[8]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[9]  Anne-Marie Kermarrec,et al.  Clustering in Peer-to-Peer File Sharing Workloads , 2004, IPTPS.

[10]  Steffen Staab,et al.  Bibster - A Semantics-Based Bibliographic Peer-to-Peer System , 2004, SEMWEB.

[11]  Joann J. Ordille,et al.  Query-Answering Algorithms for Information Agents , 1996, AAAI/IAAI, Vol. 1.

[12]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[13]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[14]  Anne-Marie Kermarrec,et al.  Exploiting semantic proximity in peer-to-peer content searching , 2004, Proceedings. 10th IEEE International Workshop on Future Trends of Distributed Computing Systems, 2004. FTDCS 2004..

[15]  Lakshmish Ramaswamy,et al.  A distributed approach to node clustering in decentralized peer-to-peer networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[16]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[17]  Prasenjit Mitra An algorithm for answering queries efficiently using views , 2001, ADC.

[18]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[19]  Werner Kießling,et al.  Preference SQL - Design, Implementation, Experiences , 2002, VLDB.

[20]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[21]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[23]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[24]  Jie Zhao,et al.  Schema Mediation in Peer Data Management Systems , 2011, Int. J. Cooperative Inf. Syst..

[25]  Verena Kantere,et al.  Coordinating Peer Databases Using ECA Rules , 2003, DBISP2P.

[26]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[27]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[28]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[29]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Jayant R. Haritsa,et al.  Plan Selection Based on Query Clustering , 2002, VLDB.

[31]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[32]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[33]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[34]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[35]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[36]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[37]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[38]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[39]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[40]  Chen Li,et al.  Answering queries using views with arithmetic comparisons , 2002, PODS '02.

[41]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[42]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[43]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[44]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[45]  Guogen Zhang,et al.  Associative query answering via query feature similarity , 1997, Proceedings Intelligent Information Systems. IIS'97.

[46]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[47]  Verena Kantere,et al.  A framework for semantic grouping in P2P databases , 2008, Inf. Syst..