Approximation Algorithms for Data Placement Problems

We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches $\mathcal{F}$, a set of clients $\mathcal{D}$, and a set of data objects $\mathcal{O}$. Each cache $i$ can store at most $u_i$ data objects. Each client $j\in\mathcal{D}$ has demand $d_j$ for a specific data object $o(j)\in\mathcal{O}$ and has to be assigned to a cache that stores that object. Storing an object $o$ in cache $i$ incurs a storage cost of $f_i^o$, and assigning client $j$ to cache $i$ incurs an access cost of $d_jc_{ij}$. The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural linear-programming relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant-factor increase in the solution cost. We also introduce the connected data placement problem to capture settings where write-requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14-approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a $k$-median variant, where there is a specified bound on the number of caches that may contain a given object, and (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.

[1]  R. Ravi,et al.  Approximation Algorithms for the Traveling Purchaser Problem and its Variants in Network Design , 1999, ESA.

[2]  Éva Tardos,et al.  An approximation algorithm for the generalized assignment problem , 1993, Math. Program..

[3]  Philip S. Yu,et al.  Replication Algorithms in a Remote Caching Architecture , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  J. Byrka An optimal bifa tor approximation algorithm forthe metri , 2007 .

[5]  Jaroslaw Byrka An Optimal Bifactor Approximation Algorithm for the Metric Uncapacitated Facility Location Problem , 2007, APPROX-RANDOM.

[6]  Chaitanya Swamy,et al.  Facility location with Service Installation Costs , 2004, SODA '04.

[7]  Kamesh Munagala,et al.  Web caching using access statistics , 2001, SODA '01.

[8]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[9]  Jiawei Zhang,et al.  Approximation algorithms for facility location problems , 2004 .

[10]  Amit Kumar,et al.  Provisioning a virtual private network: a network design problem for multicommodity flow , 2001, STOC '01.

[11]  Jiawei Zhang,et al.  A Multi-exchange Local Search Algorithm for the Capacitated Facility Location Problem: (Extended Abstract) , 2004, IPCO.

[12]  Jeffrey Scott Vitter,et al.  e-approximations with minimum packing constraint violation (extended abstract) , 1992, STOC '92.

[13]  Yuval Rabani,et al.  Competitive Algorithms for Distributed Data Management , 1995, J. Comput. Syst. Sci..

[14]  Amos Fiat,et al.  Distributed paging for general networks , 1996, SODA '96.

[15]  Nikhil Bansal,et al.  Improved approximation algorithms for broadcast scheduling , 2006, SODA '06.

[16]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[17]  Chaitanya Swamy Algorithms for the data placement problem , 2004 .

[18]  Chaitanya Swamy,et al.  Primal–Dual Algorithms for Connected Facility Location Problems , 2004, Algorithmica.

[19]  Rajmohan Rajaraman,et al.  A dynamic object replication and migration protocol for an Internet hosting service , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[20]  Fabián A. Chudak,et al.  Improved Approximation Algorithms for the Uncapacitated Facility Location Problem , 2003, SIAM J. Comput..

[21]  Rajmohan Rajaraman,et al.  Approximation algorithms for data placement in arbitrary networks , 2001, SODA '01.

[22]  Robin O. Roundy,et al.  Primal-Dual Algorithms for Deterministic Inventory Problems , 2006, Math. Oper. Res..

[23]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[24]  David P. Williamson,et al.  Improved approximation algorithms for capacitated facility location problems , 2005, Math. Program..

[25]  Abdelsalam Heddaya,et al.  WebWave: globally load balanced fully distributed caching of hot published documents , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[26]  Rajmohan Rajaraman,et al.  Analysis of a local search heuristic for facility location problems , 2000, SODA '98.

[27]  David P. Williamson,et al.  Improved approximation algorithms for capacitated facility location problems , 1999, IPCO.

[28]  Éva Tardos,et al.  Approximation algorithms for facility location problems (extended abstract) , 1997, STOC '97.

[29]  V. Mirrokni,et al.  Tight approximation algorithms for maximum general assignment problems , 2006, SODA 2006.

[30]  R. Ravi,et al.  Multicommodity facility location , 2004, SODA '04.

[31]  Chaitanya Swamy,et al.  Fault-tolerant facility location , 2003, SODA '03.

[32]  Tim Roughgarden,et al.  Simpler and better approximation algorithms for network design , 2003, STOC '03.

[33]  Matthias Westermann,et al.  Caching in Networks , 1999, GI Jahrestagung.

[34]  David B. Shmoys,et al.  Approximation algorithms for facility location problems , 2000, APPROX.

[35]  Chaitanya Swamy,et al.  LP-based approximation algorithms for capacitated facility location , 2004, Math. Program..

[36]  Sudipto Guha,et al.  Improved algorithms for the data placement problem , 2002, SODA '02.

[37]  Madhukar R. Korupolu,et al.  Coordinated placement and replacement for large-scale distributed caches , 1999, Proceedings 1999 IEEE Workshop on Internet Applications (Cat. No.PR00197).

[38]  Bruce M. Maggs,et al.  Exploiting locality for data management in systems of limited bandwidth , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[39]  J. Vitter,et al.  Approximations with Minimum Packing Constraint Violation , 1992 .

[40]  Rajmohan Rajaraman,et al.  Placement Algorithms for Hierarchical Cooperative Caching , 2001, J. Algorithms.

[41]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[42]  Olivia R. Liu Sheng Dynamic file migration in distributed computer systems , 1990, CACM.

[43]  Amos Fiat,et al.  Heat and Dump: competitive distributed paging , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[44]  Éva Tardos,et al.  Facility location with nonuniform hard capacities , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[45]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[46]  Vahab S. Mirrokni,et al.  Tight approximation algorithms for maximum general assignment problems , 2006, SODA '06.

[47]  Harald Räcke,et al.  Approximation algorithms for data management in networks , 2001, SPAA.