MAPS: approximate publish/subscribe functionality in peer-to-peer networks

Information filtering has been a research issue for years. In an information filtering scenario users information needs are expressed by user subscriptions, and users are notified about published documents or events that match these interests. The combination of the publish/subscribe scenario with the peer-to-peer (P2P) approach of autonomous peers makes high demands on the scalability and the efficiency of such a given highly distributed network. However, in many cases a subscriber is not interested in all the events that match his profile, but rather in a small representative set. In this paper, we present our approach of an approximate publish/subscribe system, that relaxes the assumption for receiving notifications from every information producer in the network. Our work builds upon distributed hash table technology to create and maintain a distributed global directory that contains information about peers' publishing behavior and combines the current peer state and the prediction of the future publishing behavior of a peer to store a subscription only to the most promising peers in the network. Our experimental evaluation shows that approximate information filtering results satisfying recall level and is able to accommodate changes in peer publishing behaviour.

[1]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[2]  Gerhard Weikum,et al.  The MINERVA Project: Database Selection in the Context of P2P Search , 2005, BTW.

[3]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[4]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[5]  Donald Kossmann,et al.  AGILE: adaptive indexing for context-aware information filters , 2005, SIGMOD '05.

[6]  Gerhard Weikum,et al.  Improving Collection Selection with Overlap-Awareness , 2005 .

[7]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[8]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[9]  Manolis Koubarakis,et al.  Publish/subscribe functionality in IR environments using structured overlay networks , 2005, SIGIR '05.

[10]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[11]  Beverly Yang,et al.  Retroactive answering of search queries , 2006, WWW '06.

[12]  Peter Triantafillou,et al.  Internet scale string attribute publish/subscribe data networks , 2005, CIKM '05.

[13]  Manolis Koubarakis,et al.  Filtering algorithms for information retrieval models with named attributes and proximity operators , 2004, SIGIR '04.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Norbert Fuhr,et al.  Evaluating different methods of estimating retrieval quality for resource selection , 2003, SIGIR.

[16]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[17]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[18]  Zhichen Xu,et al.  pFilter: global information filtering and dissemination using structured overlay networks , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[19]  Gerhard Weikum,et al.  P2P Content Search: Give the Web Back to the People , 2006, IPTPS.

[20]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[21]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[22]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[23]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.