论文信息 - Long-term Optimization of Update Frequencies for Decaying Information

Long-term Optimization of Update Frequencies for Decaying Information

Many kinds of information, such as addresses, crawls of webpages, or academic affiliations, are prone to becoming outdated over time. Therefore, in some applications, updates are performed periodically in order to keep the correctness and usefulness of such information high. As refreshing information usually has a cost, e.g. computation time, network bandwidth or human work time, a problem is to find the right update frequency depending on the benefit gained from the information and on the speed with which the information is expected to get outdated. This is especially important since often entities exhibit a different speed of getting outdated, as, e.g., addresses of students change more frequently than addresses of pensionists, or news portals change more frequently than personal homepages. Thus, there is no uniform best update frequency for all entities. Previous work [5] on data freshness has focused on the question of how to best distribute a fixed budget for updates among various entities, which is of interest in the short-term, when resources are fixed and cannot be adjusted. In the long-term, many businesses are able to adjust their resources in order to optimize their gain. Then, the problem is not one of distributing a fixed number of updates but one of determining the frequency of updates that optimizes the overall gain from the information. In this paper, we investigate how the optimal update frequency for decaying information can be determined. We show that the optimal update frequency is independent for each entity, and how simple iteration can be used to find the optimal update frequency. An implementation of our solution for exponential decay is available online.

Werner Nutt | Simon Razniewski | W. Nutt | Simon Razniewski

[1] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.

[2] Marcus Kaiser,et al. A Procedure to Develop Metrics for Currency and its Application in CRM , 2009, JDIQ.

[3] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.

[4] Kimberly Keeton,et al. Why traditional storage systems don't help us save stuff forever , 2005 .

[5] Louiqa Raschid,et al. Using Latency-Recency Profiles for Data Delivery on the Web , 2002, VLDB.

[6] Paolo Papotti,et al. The LLUNATIC Data-Cleaning Framework , 2013, Proc. VLDB Endow..

[7] Theodore Johnson,et al. Scalable Scheduling of Updates in Streaming Data Warehouses , 2012, IEEE Transactions on Knowledge and Data Engineering.

[8] Divesh Srivastava,et al. Linking temporal records , 2011, Frontiers of Computer Science.

[9] Jef Wijsen,et al. Determining the Currency of Data , 2011, TODS.

[10] Philip S. Yu,et al. Optimal crawling strategies for web search engines , 2002, WWW '02.

[11] Stanley B. Zdonik,et al. Scalable application-aware data freshening , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12] Verónika Peralta,et al. A framework for analysis of data freshness , 2004, IQIS '04.

[13] Yehuda Koren,et al. Collaborative filtering with temporal dynamics , 2009, KDD.

[14] Alexandros Labrinidis,et al. Balancing Performance and Data Freshness in Web Database Servers , 2003, VLDB.