Extending OAI-PMH over structured P2P networks for digital preservation

Open archives initiative (OAI) allows both libraries and museums create and share their own low-cost digital libraries (DL). OAI DL are based on OAI-PMH protocol which, although is consolidated as a pattern for disseminating metadata, does not rely on either digital preservation and availability of content, essential requirements in this type of system. Building new mechanisms that guarantee improvements, at no or low cost increases, becomes a great challenge. This article proposes a distributed archiving system based on a P2P network, that allows OAI-based libraries to replicate digital objects to ensure their reliability and availability. The proposed system keeps and extends the current OAI-PMH protocol characteristics and is designed as a set of OAI repositories, where each repository has an independent fail probability assigned to it. Items are inserted with a reliability that is satisfied by replicating them in subsets of repositories. Communication between the nodes (repositories) of the network is organized in a distributed hash table and multiple hash functions are used to select repositories that keep the replicas of each stored item. The OAI characteristics combined with a structured P2P digital preservation system allow the construction of a reliable and totally distributed digital library. The archiving system has been evaluated through experiments in a real environment and the OAI-PMH extension validated by the implementation of a proof-of-principle prototype.

[1]  Mary Baker,et al.  The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.

[2]  Gail McMillan,et al.  Open Archives Initiative , 2000 .

[3]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[4]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[5]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[6]  Catherine C. Marshall,et al.  Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries , 2003 .

[7]  MacKenzie Smith,et al.  The DSpace institutional digital repository system: current functionality , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[8]  Marcos Sfair Sunyé,et al.  Long-term digital archiving based on selection of repositories over P2P networks , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[9]  Kazuyuki Shudo,et al.  Overlay Weaver: An overlay construction toolkit , 2008, Computer Communications.

[10]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[11]  Everton Flávio Rufino Seára Uma arquitetura OAI para apreservação digital utilizando redes Peer-to-Peer estruturadas , 2010 .

[12]  Stratis Viglas Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation, Pre-proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS, S. Margherita di Pula, Cagliari, Italy, 24-25 June, 2004 , 2004 .

[13]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[14]  Wolfgang Nejdl,et al.  OAI-P2P: a peer-to-peer network for open archives , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[15]  D. Milojicic,et al.  Peer-to-Peer Computing , 2010 .

[16]  Patrick Valduriez,et al.  Survey of data replication in P2P systems , 2006 .

[17]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[18]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[19]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[20]  Kurt Maly,et al.  Freelib: peer-to-peer-based digital libraries , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[21]  Seif Haridi,et al.  Symmetric Replication for Structured Peer-to-Peer Systems , 2005, DBISP2P.

[22]  Salma Ktari,et al.  Performance evaluation of replication strategies in DHTs under churn , 2007, MUM.

[23]  Thomas P. Brisco DNS Support for Load Balancing , 1995, RFC.

[24]  John A. Kunze,et al.  Dublin Core Metadata for Resource Discovery , 1998, RFC.

[25]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[26]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[27]  Maristella Agosti,et al.  Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation, Pre-proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS, S. Margherita di Pula, Cagliari, Italy, 24-25 June, 2004 , 2004, DELOS Workshops / Conferences.

[28]  Joel M. Winett Definition of a socket , 1971, RFC.

[29]  Predrag Knezevic,et al.  A Self-organizing Data Store for Large Scale Distributed Infrastructures , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[30]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[31]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[32]  Hector Garcia-Molina,et al.  Creating trading networks of digital archives , 2001, JCDL '01.

[33]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[34]  Yanfei Xu,et al.  A P2P Based Personal Digital Library for Community , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[35]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[36]  William Y. Arms,et al.  The Handle System , 1997 .