MinervaDL: An Architecture for Information Retrieval and Filtering in Distributed Digital Libraries

We present MinervaDL, a digital library architecture that supports approximate information retrieval and filtering functionality under a single unifying framework. The architecture of MinervaDL is based on the peer-to-peer search engine Minerva, and is able to handle huge amounts of data provided by digital libraries in a distributed and self-organizing way. The two-tier architecture and the use of the distributed hash table as the routing substrate provides an infrastructure for creating large networks of digital libraries with minimal administration costs. We discuss the main components of this architecture, present the protocols that regulate node interactions, and experimentally evaluate our approach.

[1]  Architectural Alternatives for Information Filtering in Structured Overlay Networks , 2007 .

[2]  David R. Karger,et al.  OverCite: A Cooperative Digital Research Library , 2005, IPTPS.

[3]  Divyakant Agrawal,et al.  Content-Based Similarity Search over Peer-to-Peer Systems , 2004, DBISP2P.

[4]  Gerhard Weikum,et al.  Improving Collection Selection with Overlap-Awareness , 2005 .

[5]  Peter R. Pietzuch,et al.  Peer-to-peer overlay broker networks in an event-based middleware , 2003, DEBS '03.

[6]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[7]  Gerhard Weikum,et al.  Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices , 2006, CIKM '06.

[8]  Miguel Castro,et al.  Peer-to-Peer Systems IV, 4th International Workshop, IPTPS 2005, Ithaca, NY, USA, February 24-25, 2005, Revised Selected Papers , 2005, IPTPS.

[9]  Joemon M. Jose,et al.  An architecture for peer-to-peer information retrieval , 2003, SIGIR '03.

[10]  Wei Zhao,et al.  Networking and Mobile Computing, Third International Conference, ICCNMC 2005, Zhangjiajie, China, August 2-4, 2005, Proceedings , 2005, ICCNMC.

[11]  Manolis Koubarakis,et al.  P2P-DIET: an extensible P2P service that unifies ad-hoc and continuous querying in super-peer networks , 2004, SIGMOD '04.

[12]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[13]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[14]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[15]  Peter Triantafillou,et al.  Internet scale string attribute publish/subscribe data networks , 2005, CIKM '05.

[16]  Jie Lu,et al.  Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks , 2005, Workshop on Peer-to-Peer Information Retrieval.

[17]  Beverly Yang,et al.  Retroactive answering of search queries , 2006, WWW '06.

[18]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[19]  Manolis Koubarakis,et al.  LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs , 2005, ECDL.

[20]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[21]  Manolis Koubarakis,et al.  Filtering algorithms for information retrieval models with named attributes and proximity operators , 2004, SIGIR '04.

[22]  Jie Lu,et al.  Content-based retrieval in hybrid peer-to-peer networks , 2003, CIKM '03.

[23]  Zhichen Xu,et al.  pFilter: Global Information Filtering and Dissemination , 2002 .

[24]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[25]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[26]  Gerhard Weikum,et al.  Architectural Alternatives for Information Filtering in Structured Overlays , 2007, IEEE Internet Computing.

[27]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[28]  Wolfgang Nejdl,et al.  Publish/Subscribe for RDF-based P2P Networks , 2004, ESWS.

[29]  Zhichen Xu,et al.  pFilter: global information filtering and dissemination using structured overlay networks , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[30]  Chung-Ta King,et al.  Similarity discovery in structured P2P overlays , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[31]  Gerhard Weikum,et al.  MINERVA: Collaborative P2P Search , 2005, VLDB.

[32]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[33]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[34]  Beng Chin Ooi,et al.  MINERVA: Collaborative P2P Search (Demo) , 2005 .

[35]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[36]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.