OverCite: A Cooperative Digital Research Library

CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a proposal for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture will harness resources at many sites, and thereby be able to support new features such as document alerts and scale to larger data sets.

[1]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[2]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[3]  Scott Shenker,et al.  Peer-to-Peer Systems III, Third International Workshop, IPTPS 2004, La Jolla, CA, USA, February 26-27, 2004, Revised Selected Papers , 2005, IPTPS.

[4]  Maxwell N. Krohn,et al.  Building Secure High-Performance Web Services with OKWS , 2004, USENIX Annual Technical Conference, General Track.

[5]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[6]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[7]  Mudhakar Srivatsa,et al.  Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web , 2003, Distributed Multimedia Information Retrieval.

[8]  R. Anderson The Eternity Service , 1996 .

[9]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[10]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[11]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[12]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[13]  MacKenzie Smith DSpace for e-print archives , 2004 .

[14]  Timo Burkard,et al.  Herodotus: A Peer-to-Peer Web Archival System , 2002 .

[15]  Boon Thau Loo,et al.  Distributed Web Crawling over DHTs , 2004 .

[16]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[17]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .

[18]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[19]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[20]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[21]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[22]  David Mazières,et al.  A Toolkit for User-Level File Systems , 2001, USENIX Annual Technical Conference, General Track.

[23]  Guangwen Yang,et al.  Making Peer-to-Peer Keyword Searching Feasible Using Multi-level Partitioning , 2004, IPTPS.

[24]  David Mazières,et al.  Democratizing Content Publication with Coral , 2004, NSDI.

[25]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[26]  Hector Garcia-Molina,et al.  Parallel crawlers , 2002, WWW.

[27]  Ion Stoica,et al.  Peer-to-Peer Systems II , 2003, Lecture Notes in Computer Science.

[28]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[29]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[30]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[31]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[32]  Anjali Gupta,et al.  Efficient Routing for Peer-to-Peer Overlays , 2004, NSDI.

[33]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[34]  Ondřej Klobušník,et al.  ArXiv.org e-print archive , 2004 .

[35]  Josh Cates,et al.  Robust and efficient data management for a distributed hash table , 2003 .

[36]  Vaibhav J. Padliya PeerCrawl A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web , 2006 .

[37]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[38]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[39]  Vicky Reich,et al.  Permanent Web Publishing , 2000, USENIX Annual Technical Conference, FREENIX Track.

[40]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[41]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[42]  Scott Shenker,et al.  SmartSeer: Continuous queries over CiteSeer , 2005 .

[43]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[44]  Robert Tappan Morris,et al.  Bandwidth-efficient management of DHT routing tables , 2005, NSDI.

[45]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.