Scalable application-aware data freshening

Distributed databases and other networked information systems use copies or mirrors to reduce latency and to increase availability. Copies need to be refreshed. In a loosely coupled system, the copy sites are typically responsible for synchronizing their own copies. This involves polling and can be quite expensive if not done in a disciplined way. We explore the topic of how to determine a refresh schedule given knowledge of the update frequencies and limited bandwidth. The emphasis here is on how to use additional information about the aggregate interest of the user community in each of the copies in order to maximize the perceived freshness of the copies. We develop a model and an optimal solution for small cases, presents several heuristic algorithms that work for large cases, then explores the impact of object size on the refresh schedule. It also presents experimental evidence that our algorithms perform quite well.

[1]  Jennifer Widom,et al.  Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data , 2000, VLDB.

[2]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[3]  Alexandros Ntoulas,et al.  Effective Change Detection Using Sampling , 2002, VLDB.

[4]  Jia Wang,et al.  A survey of web caching schemes for the Internet , 1999, CCRV.

[5]  Lili Qiu,et al.  The content and access dynamics of a busy Web site: findings and implications , 2000 .

[6]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[7]  Louiqa Raschid,et al.  Using Latency-Recency Profiles for Data Delivery on the Web , 2002, VLDB.

[8]  Sang Hyuk Son,et al.  Replicated data management in distributed database systems , 1988, SGMD.

[9]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[10]  Alexandros Labrinidis,et al.  Update Propagation Strategies for Improving the Quality of Data on the Web , 2001, VLDB.

[11]  Balachander Krishnamurthy,et al.  Web Protocols and Practice - HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement , 2001 .

[12]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[13]  Venkata N. Padmanabhan,et al.  The content and access dynamics of a busy web site: findings and implicatins , 2000, SIGCOMM.

[14]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[15]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[16]  Stanley B. Zdonik,et al.  Expressing user profiles for data recharging , 2001, IEEE Wirel. Commun..

[17]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[18]  Adam Dingle,et al.  Web Cache Coherence , 1996, Comput. Networks.

[19]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.