On sample-path staleness in lazy data replication

We analyze synchronization issues arising between two stochastic point processes, one of which models data churn at an information source and the other periodic downloads from its replica (e.g., search engine, web cache, distributed database). Due to lazy (pull-based) synchronization, the replica experiences recurrent staleness, which translates into some form of penalty stemming from its reduced ability to perform consistent computation and/or provide up-to-date responses to customer requests. We model this system under non-Poisson update/refresh processes and obtain sample-path averages of various metrics of staleness cost, generalizing previous results and exposing novel problems in this field.

[1]  Zhongju Zhang,et al.  Optimal Synchronization Policies for Data Warehouses , 2006, INFORMS J. Comput..

[2]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[3]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[4]  Hyun-Kyu Cho,et al.  Efficient Monitoring Algorithm for Fast News Alerts , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Louiqa Raschid,et al.  Adaptive pull-based policies for wide area data delivery , 2006, TODS.

[6]  Gerhard Weikum,et al.  SHARC: Framework for Quality-Conscious Web Archiving , 2009, Proc. VLDB Endow..

[7]  Liuba Shrira,et al.  Providing high availability using lazy replication , 1992, TOCS.

[8]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.

[9]  Wai Chen,et al.  Measuring cache freshness by additive age , 2004, OPSR.

[10]  Prashant J. Shenoy,et al.  Maintaining mutual consistency for cached Web objects , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[11]  Jie Mi,et al.  An Optimal Trade-off between Content Freshness and Refresh Cost , 2010, ArXiv.

[12]  Zhen Liu,et al.  Optimal Robot Scheduling for Web Search Engines , 1998 .

[13]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[14]  S. Albin On Poisson Approximations for Superposition Arrival Processes in Queues , 1982 .

[15]  Ronald W. Wolff,et al.  Poisson Arrivals See Time Averages , 1982, Oper. Res..

[16]  Prasenjit Mitra,et al.  Clustering-based incremental web crawling , 2010, TOIS.

[17]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.

[18]  Ronald W. Wolff,et al.  Stochastic Modeling and the Theory of Queues , 1989 .

[19]  Sridhar Machiraju,et al.  The role of PASTA in network measurement , 2006, SIGCOMM 2006.

[20]  Ouri Wolfson,et al.  Divergence caching in client-server architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[21]  E. B. Hall,et al.  Counterexamples in Probability and Real Analysis , 1993 .

[22]  Norman Matloff Estimation of internet file-access/modification rates from indirect data , 2005, TOMC.

[23]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[24]  Avigdor Gal,et al.  Managing periodically updated data in relational databases: a stochastic modeling approach , 2000, JACM.

[25]  Sandeep Pandey,et al.  Recrawl scheduling based on information longevity , 2008, WWW.

[26]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[27]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD 2000.

[28]  Avigdor Gal,et al.  Monitoring an Information Source Under a Politeness Constraint , 2008, INFORMS J. Comput..

[29]  L. Gordon,et al.  Two moments su ce for Poisson approx-imations: the Chen-Stein method , 1989 .

[30]  Haixun Wang,et al.  Optimizing content freshness of relations extracted from the web using keyword search , 2010, SIGMOD Conference.

[31]  Kyu-Young Whang,et al.  An update-risk based approach to TTL estimation in Web caching , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[32]  Stanley B. Zdonik,et al.  Scalable application-aware data freshening , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[33]  Ward Whitt,et al.  On Arrivals That See Time Averages , 1990, Oper. Res..