Distributed Large-Scale Information Filtering

We study the problem of distributed resource sharing in peer-to-peer networks and focus on the problem of information filter- ing. In our setting, subscriptions and publications are specified using an expressive attribute-value representation that supports both the Boolean and Vector Space models. We use an extension of the distributed hash table Chord to organise the nodes and store user subscriptions, and utilise efficient publication protocols that keep the network traffic and latency low at filtering time. To verify our approach, we evaluate the proposed protocols experimentally using thousands of nodes, millions of user sub- scriptions, and two different real-life corpora. We also study three impor- tant facets of the load-balancing problem in such a scenario and present a novel algorithm that manages to distribute the load evenly among the nodes. Our results show that the designed protocols are scalable and efficient: they achieve expressive information filtering functionality with low message traffic and latency.

[1]  Manolis Koubarakis,et al.  Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[3]  Zhichen Xu,et al.  pFilter: global information filtering and dissemination using structured overlay networks , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[4]  Peter R. Pietzuch,et al.  Peer-to-peer overlay broker networks in an event-based middleware , 2003, DEBS '03.

[5]  Chung-Ta King,et al.  Similarity discovery in structured P2P overlays , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[6]  Manolis Koubarakis,et al.  Atlas: Storing, updating and querying RDF(S) data on top of DHTs , 2010, J. Web Semant..

[7]  Evaggelia Pitoura,et al.  Preference-aware publish/subscribe delivery with diversity , 2009, DEBS '09.

[8]  Manolis Koubarakis,et al.  Continuous RDF Query Processing over DHTs , 2007, ISWC/ASWC.

[9]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[10]  Gerhard Weikum,et al.  MINERVA: Collaborative P2P Search , 2005, VLDB.

[11]  Yiming Hu,et al.  Ferry: A P2P-Based Architecture for Content-Based Publish/Subscribe Services , 2007, IEEE Transactions on Parallel and Distributed Systems.

[12]  Manolis Koubarakis,et al.  Information filtering and query indexing for an information retrieval model , 2009, TOIS.

[13]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[14]  Divyakant Agrawal,et al.  Meghdoot: Content-Based Publish/Subscribe over P2P Networks , 2004, Middleware.

[15]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[16]  Vijay Gopalakrishnan,et al.  Adaptive replication in peer-to-peer systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[17]  Peter Triantafillou,et al.  Internet scale string attribute publish/subscribe data networks , 2005, CIKM '05.

[18]  Vineet Sinha,et al.  Comparative Study of P2P and Cloud Computing Paradigm Usage in Research Purposes , 2011 .

[19]  Peter Triantafillou,et al.  PastryStrings: A Comprehensive Content-Based Publish/Subscribe DHT Network , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[20]  Jiuxin Cao,et al.  Pat: A P2P Based Publish/Subscribe System for QoS Information Dissemination of Web Services , 2009, 2009 IEEE International Conference on Web Services.

[21]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.

[22]  Peter Triantafillou,et al.  eXO: Decentralized Autonomous Scalable Social Networking , 2011, CIDR.

[23]  Zhichen Xu,et al.  pFilter: Global Information Filtering and Dissemination , 2002 .

[24]  Karl Aberer,et al.  My3: A highly-available P2P-based online social network , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[25]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[26]  Euripides G. M. Petrakis,et al.  DS4: A Distributed Social and Semantic Search System , 2013, ECIR.

[27]  Shou-Chih Lo,et al.  Design of Content-Based Publish/Subscribe Systems over Structured Overlay Networks , 2008, IEICE Trans. Inf. Syst..

[28]  Karl Aberer,et al.  Query-load balancing in structured overlays , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[29]  Alberto Montresor,et al.  P2P and Cloud: A Marriage of Convenience for Replica Management , 2012, IWSOS.

[30]  Alvaro A. A. Fernandes,et al.  Parallel Query Processing on the Grid , 2009, Parallel Programming, Models and Applications in Grid and P2P Systems.

[31]  Gerhard Weikum,et al.  A Comparative Study of Pub/Sub Methods in Structured P2P Networks , 2006, DBISP2P.

[32]  Dominic Battré,et al.  Query Planning in DHT Based RDF Stores , 2008, 2008 IEEE International Conference on Signal Image Technology and Internet Based Systems.

[33]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[34]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[35]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[36]  Vijay Gopalakrishnan,et al.  Efficient Peer-To-Peer Searches Using Result-Caching , 2003, IPTPS.

[37]  Dominic Battré,et al.  Towards Parallel Processing of RDF Queries in DHTs , 2009, Globe.

[38]  Manolis Koubarakis,et al.  LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs , 2005, ECDL.

[39]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[40]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[41]  R. Huebsch Content-Based Multicast: Comparison of Implementation Options , 2003 .

[42]  Manolis Koubarakis,et al.  Xml data dissemination using automata on top of structured overlay networks , 2008, WWW.

[43]  Hans-Arno Jacobsen,et al.  Load Balancing Content-Based Publish/Subscribe Systems , 2010, TOCS.

[44]  Ralf Steinmetz,et al.  LifeSocial.KOM: A P2P-Based Platform for Secure Online Social Networks , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[45]  Patrick Valduriez,et al.  Efficient Early Top-k Query Processing in Overloaded P2P Systems , 2011, DEXA.

[46]  Gerhard Weikum,et al.  Architectural Alternatives for Information Filtering in Structured Overlays , 2007, IEEE Internet Computing.

[47]  Beng Chin Ooi,et al.  MINERVA: Collaborative P2P Search (Demo) , 2005 .

[48]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[49]  Hans-Arno Jacobsen,et al.  Building Content-Based Publish/Subscribe Systems with Distributed Hash Tables , 2003, DBISP2P.

[50]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[51]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[52]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[53]  Maria Gradinariu Potop-Butucaru,et al.  A Framework for Secure and Private P2P Publish/Subscribe , 2010, SSS.

[54]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[55]  Stéphane Bressan,et al.  Efficient Range Queries and Fast Lookup Services for Scalable P2P Networks , 2004, DBISP2P.

[56]  Rajeev Rastogi,et al.  Accelerating Lookups in P2P Systems using Peer Caching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[58]  Patrick Valduriez,et al.  DHTJoin: processing continuous join queries using DHT networks , 2009, Distributed and Parallel Databases.

[59]  Manolis Koubarakis,et al.  Selective information dissemination in P2P networks: problems and solutions , 2003, SGMD.

[60]  Manolis Koubarakis,et al.  Publish/subscribe functionality in IR environments using structured overlay networks , 2005, SIGIR '05.

[61]  Dominic Battré,et al.  Dynamic Knowledge in DHT Based RDF Stores , 2008, SWWS.

[62]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[63]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[64]  Ling Liu,et al.  PeerCQ: a decentralized and self-configuring peer-to-peer information monitoring system , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[65]  Haiying Shen Efficient and effective file replication in structured P2P file sharing systems , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[66]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[67]  David Taniar,et al.  Data Management in Cloud, Grid and P2P Systems , 2014, Lecture Notes in Computer Science.

[68]  Cuong Pham,et al.  Enabling content-based publish/subscribe services in cooperative P2P networks , 2010, Comput. Networks.

[69]  Alejandro P. Buchmann,et al.  A peer-to-peer approach to content-based publish/subscribe , 2003, DEBS '03.

[70]  Spiros Skiadopoulos,et al.  Logic and Computational Complexity for Boolean Information Retrieval , 2006, IEEE Transactions on Knowledge and Data Engineering.

[71]  Joaquín Salvachúa,et al.  A novel P2P and cloud computing hybrid architecture for multimedia streaming with QoS cost functions , 2010, ACM Multimedia.

[72]  Architectural Alternatives for Information Filtering in Structured Overlay Networks , 2007 .

[73]  Robert M. MacGregor,et al.  A subscribable peer-to-peer RDF repository for distributed metadata management , 2004, J. Web Semant..

[74]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[75]  Peter Triantafillou Content-based publish-subscribe over structured P2P networks , 2004, ICSE 2004.

[76]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.