Data Caching under Number Constraint

Caching can significantly improve the efficiency of information access in networks by reducing the access latency and bandwidth usage. However, excessive caching can lead to prohibitive system cost and performance degradation. In this article, we consider the problem of caching a data item in a network wherein the data item is read as well as updated by other nodes and there is a limit on the number of cache nodes allowed. More formally, given a network graph, the read/write frequencies to the data item by each node, and the cost of caching the data item at each node, the problem addressed in this article is to select a set of P nodes to cache the data item such that the sum of the reading, writing (using an optimal Steiner tree), and storage cost is minimized. For networks with a tree topology, we design an optimal dynamic programming algorithm that runs in O(|V|3P2), where |V| is the size of the network and P is the allowed number of caches. For the general graph topology, where the problem is NP-complete, we present a centralized heuristic and its distributed implementation. Through extensive simulations in general graphs, we show that the centralized heuristic performs very close to the exponential optimal algorithm for small networks, and for larger networks, the distributed implementation and the dynamic programming algorithm on an appropriately extracted tree perform quite close to the centralized heuristic.

[1]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[2]  Philipp Boksberger Minimum Stretch Spanning Trees , 2003 .

[3]  Bin Tang,et al.  Cache Placement in Sensor Networks Under Update Cost Constraint , 2005, ADHOC-NOW.

[4]  Sudipto Guha,et al.  Improved combinatorial algorithms for the facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[5]  Jeffrey Scott Vitter,et al.  Approximation Algorithms for Geometric Median Problems , 1992, Inf. Process. Lett..

[6]  Bo Li,et al.  On the optimal placement of web proxies in the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[7]  Ouri Wolfson,et al.  The multicast policy and its relationship to replicated data placement , 1991, TODS.

[8]  Arie Tamir,et al.  An O(pn2) algorithm for the p-median and related problems on tree graphs , 1996, Oper. Res. Lett..

[9]  H. Pollak,et al.  Steiner Minimal Trees , 1968 .

[10]  Konstantinos Kalpakis,et al.  Steiner-optimal data replication in tree networks with storage costs , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[11]  David P. Williamson,et al.  Improved approximation algorithms for capacitated facility location problems , 1999, IPCO.

[12]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[13]  David Peleg,et al.  Approximating Minimum Max-Stretch spanning Trees on unweighted graphs , 2004, SODA '04.

[14]  Fabián A. Chudak,et al.  Improved approximation algorithms for a capacitated facility location problem , 1999, SODA '99.

[15]  Shang-Hua Teng,et al.  Lower-stretch spanning trees , 2004, STOC '05.

[16]  P. Krishnan,et al.  The cache location problem , 2000, TNET.