On Data Placement in Distributed Systems

Data placement refers to the problem of deciding how to assign data items to nodes in a distributed system to optimize one or several of a number of performance criteria such as reducing network congestion, improving load balancing, among others. This document reports on our experience when addressing this problem in distributed systems of different scales, namely: medium size datacenter-scale and internet-scale systems.

[1]  Pierre Sens,et al.  Churn-Resilient Replication Strategy for Peer-to-Peer Distributed Hash-Tables , 2009, SSS.

[2]  Luís E. T. Rodrigues,et al.  Overnesia: A Resilient Overlay Network for Virtual Super-Peers , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[3]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[4]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[5]  Luís E. T. Rodrigues,et al.  Policies for Efficient Data Replication in P2P Systems , 2013, 2013 International Conference on Parallel and Distributed Systems.

[6]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[7]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Luís E. T. Rodrigues,et al.  AutoPlacer: Scalable Self-Tuning Data Placement in Distributed Key-Value Stores , 2015, TAAS.

[10]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[11]  Valerio Schiavoni,et al.  Exploiting Node Connection Regularity for DHT Replication , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Luis Rodrigues,et al.  MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations , 2013, FCCM 2013.

[16]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[17]  Seung-won Hwang,et al.  Ursa: Scalable Load and Power Management in Cloud Storage Systems , 2013, TOS.

[18]  Ivan Beschastnikh,et al.  Scalable consistency in Scatter , 2011, SOSP.

[19]  Daniel M. Dias,et al.  A modeling study of the TPC-C benchmark , 1993, SIGMOD '93.

[20]  Brian D. Noble,et al.  Exploiting Availability Prediction in Distributed Systems , 2006, NSDI.