Peer-to-Peer Data Management

This lecture introduces systematically into the problem of managing large data collections in peer-to-peer systems. Search over large datasets has always been a key problem in peer-to-peer systems and the peer-to-peer paradigm has incited novel directions in the field of data management. This resulted in many novel peer-to-peer data management concepts and algorithms, for supporting data management tasks in a wider sense, including data integration, document management and text retrieval. The lecture covers four different types of peer-to-peer data management systems that are characterized by the type of data they manage and the search capabilities they support. The first type are structured peer-to-peer data management systems which support structured query capabilities for standard data models. The second type are peer-to-peer data integration systems for querying of heterogeneous databases without requiring a common global schema. The third type are peer-to-peer document retrieval systems that enable document search based both on the textual content and the document structure. Finally, we introduce semantic overlay networks, which support similarity search on information represented in hierarchically organized and multi-dimensional semantic spaces. Topics that go beyond data representation and search are summarized at the end of the lecture. Table of Contents: Introduction / Structured Peer-to-Peer Databases / Peer-to-peer Data Integration / Peer-to-peer Retrieval / Semantic Overlay Networks / Conclusion

[1]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM 2004.

[2]  Ashwin Machanavajjhala,et al.  P-ring: an efficient and robust P2P range index structure , 2007, SIGMOD '07.

[3]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[4]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[5]  Divyakant Agrawal,et al.  Content-Based Similarity Search over Peer-to-Peer Systems , 2004, DBISP2P.

[6]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[7]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[8]  Dan S. Wallach,et al.  A Survey of Peer-to-Peer Security Issues , 2002, ISSS.

[9]  Spyros Kotoulas,et al.  pRoute: Peer selection using shared term similarity matrices , 2007, Web Intell. Agent Syst..

[10]  Wolfgang Kellerer,et al.  Leveraging Social Networks for Increased BitTorrent Robustness , 2010, 2010 7th IEEE Consumer Communications and Networking Conference.

[11]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[12]  Gerti Kappel,et al.  Using taxonomies for content-based routing with ants , 2007, Comput. Networks.

[13]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[14]  Tim Moors,et al.  Survey of research towards robust peer-to-peer networks: Search methods , 2006, Comput. Networks.

[15]  Roger Dingledine,et al.  The Free Haven Project: Distributed Anonymous Storage Service , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[16]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[17]  Daniel Stutzbach,et al.  On the Long-term Evolution of the Two-Tier Gnutella Overlay , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[18]  Mehedi Masud,et al.  Transaction processing in a peer to peer database network , 2011, Data Knowl. Eng..

[19]  Stijn Christiaens,et al.  Metadata Mechanisms: From Ontology to Folksonomy ... and Back , 2006, OTM Workshops.

[20]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks: Algorithms and evaluation , 2006, Perform. Evaluation.

[21]  Gerhard Weikum,et al.  p2pDating: Real life inspired semantic overlay networks for Web search , 2007, Inf. Process. Manag..

[22]  Karsten Schwan,et al.  Resource-Aware Distributed Stream Management Using Dynamic Overlays , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[23]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[24]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[25]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[26]  Seif Haridi,et al.  Efficient Broadcast in Structured P2P Networks , 2003, IPTPS.

[27]  Manolis Koubarakis,et al.  Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks , 2006, SEMWEB.

[28]  Karl Aberer,et al.  Managing trust in a peer-2-peer information system , 2001, CIKM '01.

[29]  Jun Wang,et al.  TRIBLER: a social‐based peer‐to‐peer system , 2008, IPTPS.

[30]  Karl Aberer,et al.  On Small World Graphs in Non-uniformly Distributed Key Spaces , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[31]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[32]  Vwani P. Roychowdhury,et al.  Percolation search in power law networks: making unstructured peer-to-peer networks scalable , 2004 .

[33]  Beng Chin Ooi,et al.  Distributed Online Aggregation , 2009, Proc. VLDB Endow..

[34]  Christoph Koch,et al.  Query rewriting with symmetric constraints , 2002, AI Commun..

[35]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[36]  Diego Calvanese,et al.  Hyper: A Framework for Peer-to-Peer Data Integration on Grids , 2004, ICSNW.

[37]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[38]  Jiannong Cao,et al.  Efficient Range Query Processing in Peer-to-Peer Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[39]  Dominic Battré Caching of intermediate results in DHT-based RDF stores , 2008, Int. J. Metadata Semant. Ontologies.

[40]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[41]  Min Cai,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Journal of Grid Computing.

[42]  Divyakant Agrawal,et al.  A peer-to-peer framework for caching range queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[43]  Raquel Menezes,et al.  Fast Estimation of Aggregates in Unstructured Networks , 2009, 2009 Fifth International Conference on Autonomic and Autonomous Systems.

[44]  Bongki Moon,et al.  Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables , 2009, IEEE Transactions on Knowledge and Data Engineering.

[45]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[46]  Zhichen Xu,et al.  Building Low-maintenance Expressways for P2P Systems , 2002 .

[47]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[48]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[49]  Zachary G. Ives,et al.  Updates and Transactions in Peer-to-Peer Systems , 2009, Encyclopedia of Database Systems.

[50]  Karl Aberer,et al.  Probabilistic Message Passing in Peer Data Management Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[51]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[52]  Pavel Zezula,et al.  A Content-Addressable Network for Similarity Search in Metric Spaces , 2005, DBISP2P.

[53]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[54]  Marián Boguñá,et al.  Navigability of Complex Networks , 2007, ArXiv.

[55]  Audun Jøsang,et al.  A survey of trust and reputation systems for online service provision , 2007, Decis. Support Syst..

[56]  Ioana Manolescu,et al.  XML processing in DHT networks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Ravi Kumar,et al.  Compressed web indexes , 2009, WWW '09.

[58]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[59]  Andrei V. Gurtov,et al.  Survey on hierarchical routing schemes in “flat” distributed hash tables , 2011, Peer-to-Peer Netw. Appl..

[60]  Richard M. Karp,et al.  Load Balancing in Structured P2P Systems , 2003, IPTPS.

[61]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[62]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[63]  Karl Aberer,et al.  Web text retrieval with a P2P query-driven index , 2007, SIGIR.

[64]  Karl Aberer,et al.  Congestion Control for Distributed Hash Tables , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[65]  Jie Zhao,et al.  Schema Mediation in Peer Data Management Systems , 2011, Int. J. Cooperative Inf. Syst..

[66]  Evaggelia Pitoura,et al.  Content-Based Overlay Networks for XML Peers Based on Multi-level Bloom Filters , 2003, DBISP2P.

[67]  Beng Chin Ooi,et al.  Schema Mapping in P2P Networks Based on Classification and Probing , 2007, DASFAA.

[68]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[69]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[70]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[71]  Steffen Staab,et al.  Remindin': semantic query routing in peer-to-peer networks based on social metaphors , 2004, WWW '04.

[72]  David J. DeWitt,et al.  Locating Data Sources in Large Distributed Systems , 2003, VLDB.

[73]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[74]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[75]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[76]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[77]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[78]  Stefano Lodi,et al.  Semantic peer, here are the neighbors you want! , 2008, EDBT '08.

[79]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[80]  Karl Aberer,et al.  Efficient Processing of XPath Queries with Structured Overlay Networks , 2005, OTM Conferences.

[81]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[82]  Karl Aberer,et al.  On Routing in Distributed Hash Tables , 2007 .

[83]  Guillaume Urvoy-Keller,et al.  Data indexing in peer-to-peer DHT networks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[84]  Joseph M. Hellerstein,et al.  Toward network data independence , 2003, SGMD.

[85]  Dimitrios Gunopulos,et al.  Approximating Aggregation Queries in Peer-to-Peer Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[86]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[87]  Ji Li,et al.  Implementing aggregation and broadcast over Distributed Hash Tables , 2005, CCRV.

[88]  Andy Oram,et al.  Peer-to-Peer: Harnessing the Power of Disruptive Technologies , 2001 .

[89]  Rüdiger Schollmeier,et al.  Routing in Mobile Ad-hoc and Peer-to-Peer Networks A Comparison , 2002, NETWORKING Workshops.

[90]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[91]  Paolo Manghi,et al.  XPeer: A Self-Organizing XML P2P Database System , 2004, EDBT Workshops.

[92]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[93]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[94]  Jeffrey Considine,et al.  Simple Load Balancing for Distributed Hash Tables , 2003, IPTPS.

[95]  Euripides G. M. Petrakis,et al.  iCluster: A Self-organizing Overlay Network for P2P Information Retrieval , 2008, ECIR.

[96]  Karl Aberer,et al.  Structured overlay for heterogeneous environments: Design and evaluation of oscar , 2010, TAAS.

[97]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[98]  Karl Aberer,et al.  Improving Data Access in P2P Systems , 2002, IEEE Internet Comput..

[99]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[100]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[101]  Karl Aberer,et al.  An Overview of Peer-to-Peer Information Systems , 2002, WDAS.

[102]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[103]  Alfredo Cuzzocrea,et al.  Storing and retrieving XPath fragments in structured P2P networks , 2006, Data Knowl. Eng..

[104]  Anthony K. H. Tung,et al.  Efficient Skyline Query Processing on Peer-to-Peer Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[105]  Beng Chin Ooi,et al.  Peer-to-Peer Computing - Principles and Applications , 2009 .

[106]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[107]  Steffen Staab,et al.  Searching Dynamic Communities with Personal Indexes , 2005, SEMWEB.

[108]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[109]  Salima Hassas,et al.  Self-Organisation: Paradigms and Applications , 2003, Engineering Self-Organising Systems.

[110]  Krishna P. Gummadi,et al.  The impact of DHT routing geometry on resilience and proximity , 2003, SIGCOMM '03.

[111]  Karl Aberer,et al.  A framework for semantic gossiping , 2002, SGMD.

[112]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[113]  James Allan,et al.  A survey in indexing and searching XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[114]  Ran Wolff,et al.  Distributed Data Mining in Peer-to-Peer Networks , 2006, IEEE Internet Computing.

[115]  Karl Aberer,et al.  Internet-Scale Storage Systems under Churn -- A Study of the Steady-State using Markov Models , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[116]  Hector Garcia-Molina,et al.  Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems , 2004, VLDB.

[117]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[118]  Kai Hwang,et al.  Distributed Aggregation Algorithms with Load-Balancing for Scalable Grid Resource Monitoring , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[119]  Jesús Carretero,et al.  Affinity P2P: A self-organizing content-based locality-aware collaborative peer-to-peer network , 2010, Comput. Networks.

[120]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[121]  Marcelo Arenas,et al.  Data Sharing Through Query Translation in Autonomous Sources , 2004, VLDB.

[122]  Beng Chin Ooi,et al.  Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[123]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[124]  Karl Aberer,et al.  Generic Emergent Overlays in Arbitrary Peer Identifier Spaces , 2007, IWSOS.

[125]  Elisa Bertino,et al.  P-Hera: scalable fine-grained access control for P2P infrastructures , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[126]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[127]  Karl Aberer,et al.  Infrastructure for Data Processing in Large-Scale Interconnected Sensor Networks , 2007, 2007 International Conference on Mobile Data Management.

[128]  Dimitrios Tsoumakos,et al.  Analysis and comparison of P2P search methods , 2006, InfoScale '06.

[129]  Frank van Harmelen,et al.  Peer Selection in Peer-to-Peer Networks with Semantic Topologies , 2004, ICSNW.

[130]  Karl Aberer,et al.  Updates in highly unreliable, replicated peer-to-peer systems , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[131]  Wolfgang Nejdl,et al.  Information Integration in Schema-Based Peer-To-Peer Networks , 2003, CAiSE.

[132]  Peter Triantafillou,et al.  Peer-to-Peer Publish-Subscribe Systems , 2009, Encyclopedia of Database Systems.

[133]  Márk Jelasity,et al.  Epidemic-style proactive aggregation in large overlay networks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[134]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[135]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[136]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[137]  Christos Doulkeridis,et al.  DESENT: decentralized and distributed semantic overlay generation in P2P networks , 2007, IEEE Journal on Selected Areas in Communications.

[138]  Karl Aberer,et al.  GridVine: An Infrastructure for Peer Information Management , 2007, IEEE Internet Computing.

[139]  Christoph Schmitz Self-Organization of a Small World by Topic , 2004, LWA.

[140]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[141]  Gurmeet Singh Manku,et al.  Symphony: Distributed Hashing in a Small World , 2003, USENIX Symposium on Internet Technologies and Systems.

[142]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[143]  Roger Wattenhofer,et al.  Aggregating information in peer-to-peer systems for improved join and leave , 2004 .

[144]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[145]  Verena Kantere,et al.  GrouPeer: Dynamic clustering of P2P databases , 2009, Inf. Syst..

[146]  W. Bruce Croft,et al.  Searching Distributed Collections With Inference Networks , 2017, SIGF.

[147]  Karl Aberer,et al.  The essence of P2P: a reference architecture for overlay networks , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[148]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[149]  Christian Bizer,et al.  The Emerging Web of Linked Data , 2009, IEEE Intelligent Systems.

[150]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[151]  Karl Aberer,et al.  Range queries in trie-structured overlays , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[152]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[153]  Wolfgang Nejdl,et al.  Super-peer-based routing strategies for RDF-based peer-to-peer networks , 2004, J. Web Semant..

[154]  Florian Schintke,et al.  Range queries on structured overlay networks , 2008, Comput. Commun..

[155]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[156]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[157]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[158]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[159]  Karl Aberer,et al.  Distributed similarity search in high dimensions using locality sensitive hashing , 2009, EDBT '09.

[160]  Anand Sivasubramaniam,et al.  Semantic small world: an overlay network for peer-to-peer search , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[161]  Matei Ripeanu,et al.  Peer-to-peer architecture case study: Gnutella network , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[162]  Oswald Drobnik,et al.  SPIRIX: A Peer-to-Peer Search Engine for XML-Retrieval , 2008, INEX.

[163]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[164]  Artur Andrzejak,et al.  Scalable, efficient range queries for grid information services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[165]  Jin Li,et al.  On peer-to-peer (P2P) content delivery , 2008, Peer-to-Peer Netw. Appl..

[166]  D. Milojicic,et al.  Peer-to-Peer Computing , 2010 .