On sample-path staleness in lazy data replication

We analyze synchronization issues between two point processes, one modeling data churn at an information source and the other periodic downloads to its replica (e.g., search engine, web cache, distributed database). Due to pull-based synchronization, the replica experiences recurrent staleness, which translates into some form of penalty stemming from its reduced ability to perform consistent computation and/or provide up-to-date responses to customer requests. We model this system under non-Poisson update/refresh processes and obtain sample-path averages of various metrics of staleness cost, generalizing previous results and exposing novel problems in this field.

[1]  Jie Mi,et al.  An Optimal Trade-off between Content Freshness and Refresh Cost , 2010, ArXiv.

[2]  Zhen Liu,et al.  Optimal Robot Scheduling for Web Search Engines , 1998 .

[3]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[4]  Ouri Wolfson,et al.  Divergence caching in client-server architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[5]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[6]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[7]  Kyu-Young Whang,et al.  An update-risk based approach to TTL estimation in Web caching , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[8]  Hyun-Kyu Cho,et al.  Efficient Monitoring Algorithm for Fast News Alerts , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  François Baccelli,et al.  The Role of PASTA in Network Measurement , 2006, IEEE/ACM Transactions on Networking.

[10]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[11]  Zhongju Zhang,et al.  Optimal Synchronization Policies for Data Warehouses , 2006, INFORMS J. Comput..

[12]  R. Durrett Probability: Measure Theory , 2010 .

[13]  L. Gordon,et al.  Two moments su ce for Poisson approx-imations: the Chen-Stein method , 1989 .

[14]  Sandeep Pandey,et al.  Recrawl scheduling based on information longevity , 2008, WWW.

[15]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[16]  Fabián E. Bustamante,et al.  Friendships that Last: Peer Lifespan and its Role in P2P Protocols , 2003, WCW.

[17]  Ronald W. Wolff,et al.  A Review of Regenerative Processes , 1993, SIAM Rev..

[18]  Stanley B. Zdonik,et al.  Scalable application-aware data freshening , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Haixun Wang,et al.  Optimizing content freshness of relations extracted from the web using keyword search , 2010, SIGMOD Conference.

[20]  Tunc Geveci,et al.  Advanced Calculus , 2014, Nature.

[21]  Kihong Park,et al.  On the relationship between file sizes, transport protocols, and self-similar network traffic , 1996, Proceedings of 1996 International Conference on Network Protocols (ICNP-96).

[22]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[23]  Prasenjit Mitra,et al.  Clustering-based incremental web crawling , 2010, TOIS.

[24]  Avigdor Gal,et al.  Monitoring an Information Source Under a Politeness Constraint , 2008, INFORMS J. Comput..

[25]  Dmitri Loguinov,et al.  On sample-path staleness in lazy data replication , 2015, INFOCOM.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Ronald W. Wolff,et al.  Stochastic Modeling and the Theory of Queues , 1989 .

[28]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[29]  Alberto Vancheri,et al.  Empirical Analysis of User Participation in Online Communities: the Case of Wikipedia , 2010, ICWSM.

[30]  Ward Whitt,et al.  On Arrivals That See Time Averages , 1990, Oper. Res..

[31]  E. B. Hall,et al.  Counterexamples in Probability and Real Analysis , 1993 .

[32]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[33]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[34]  S. Resnick A Probability Path , 1999 .

[35]  Norman Matloff Estimation of internet file-access/modification rates from indirect data , 2005, TOMC.

[36]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[37]  Andrew Tomkins,et al.  How to build a WebFountain: An architecture for very large-scale text analytics , 2004, IBM Syst. J..

[38]  Prashant J. Shenoy,et al.  Maintaining mutual consistency for cached Web objects , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[39]  S. Albin On Poisson Approximations for Superposition Arrival Processes in Queues , 1982 .

[40]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.

[41]  Wai Chen,et al.  Measuring cache freshness by additive age , 2004, OPSR.

[42]  Gerhard Weikum,et al.  SHARC: Framework for Quality-Conscious Web Archiving , 2009, Proc. VLDB Endow..

[43]  2015 IEEE Conference on Computer Communications, INFOCOM 2015, Kowloon, Hong Kong, April 26 - May 1, 2015 , 2015, IEEE Conference on Computer Communications.

[44]  Liuba Shrira,et al.  Providing high availability using lazy replication , 1992, TOCS.

[45]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[46]  Ronald W. Wolff,et al.  Poisson Arrivals See Time Averages , 1982, Oper. Res..

[47]  Avigdor Gal,et al.  Managing periodically updated data in relational databases: a stochastic modeling approach , 2000, JACM.

[48]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD 2000.

[49]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[50]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[51]  Louiqa Raschid,et al.  Adaptive pull-based policies for wide area data delivery , 2006, TODS.

[52]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.