Web Synchronization Simulations using the ResourceSync Framework

Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis for understanding its properties. We then describe experiments in which simulations of a variety of synchronization scenarios illustrate the effects of model configuration on consistency, latency, and data transfer efficiency. These results provide insight into which congurations are appropriate for various application scenarios.

[1]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[2]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[3]  Herbert Van de Sompel,et al.  A Technical Framework for Resource Synchronization , 2013, D Lib Mag..

[4]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[5]  Herbert Van de Sompel,et al.  ResourceSync: leveraging sitemaps for resource synchronization , 2013, WWW.

[6]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Hector Garcia-Molina,et al.  Crawler-Friendly Web Servers , 2000, PERV.

[9]  Harry Halpin,et al.  Architecture of the World Wide Web , 2013 .

[10]  Herbert Van de Sompel,et al.  A Perspective on Resource Synchronization , 2012, D Lib Mag..

[11]  David Linner,et al.  Instant Synchronization of States in Web Hypertext Applications , 2012 .

[12]  Zhiwu Xie,et al.  ResourceSync Framework Specification , 2014 .

[13]  Antoine Isaac,et al.  data.europeana.eu: The Europeana Linked Open Data Pilot , 2011, Dublin Core Conference.

[14]  Uri Schonfeld,et al.  Sitemaps: above and beyond the crawl of duty , 2009, WWW '09.

[15]  Andrew H. Mutz,et al.  Transparent Content Negotiation in HTTP , 1998, RFC.

[16]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[17]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[18]  Herbert Van de Sompel,et al.  Memento: Time Travel for the Web , 2009, ArXiv.

[19]  Sandra Payette,et al.  Fedora: an architecture for complex objects and their relationships , 2005, International Journal on Digital Libraries.