Designing a DHT for Low Latency and High Throughput

Designing a wide-area distributed hash table (DHT) that provides high-throughput and low-latency network storage is a challenge. Existing systems have explored a range of solutions, including iterative routing, recursive routing, proximity routing and neighbor selection, erasure coding, replication, and server selection. This paper explores the design of these techniques and their interaction in a complete system, drawing on the measured performance of a new DHT implementation and results from a simulator with an accurate Internet latency model. New techniques that resulted from this exploration include use of latency predictions based on synthetic co-ordinates, efficient integration of lookup routing and data fetching, and a congestion control mechanism suitable for fetching data striped over large numbers of servers. Measurements with 425 server instances running on 150 PlanetLab and RON hosts show that the latency optimizations reduce the time required to locate and fetch data by a factor of two. The throughput optimizations result in a sustainable bulk read throughput related to the number of DHT hosts times the capacity of the slowest access link; with 150 selected PlanetLab hosts, the peak aggregate throughput over multiple clients is 12.8 megabytes per second.

[1]  V. Jacobson,et al.  Congestion avoidance and control , 1988, SIGCOMM '88.

[2]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[3]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[4]  R. Anderson The Eternity Service , 1996 .

[5]  W. Richard Stevens,et al.  TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms , 1997, RFC.

[6]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[7]  Andrew V. Goldberg,et al.  A prototype implementation of archival Intermemory , 1999, DL '99.

[8]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[9]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[10]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[11]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[12]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[13]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[14]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[15]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  Anne-Marie Kermarrec,et al.  HiScamp: self-organizing hierarchical membership protocol , 2002, EW 10.

[17]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[18]  S. Gribble,et al.  King: estimating latency between arbitrary internet end hosts , 2002, CCRV.

[19]  Peter Druschel,et al.  Exploiting network proximity in peer-to-peer overlay networks , 2002 .

[20]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[21]  Antony I. T. Rowstron,et al.  Squirrel: a decentralized peer-to-peer web cache , 2002, PODC '02.

[22]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[23]  Timothy Roscoe,et al.  Mnemosyne: Peer-to-Peer Steganographic Storage , 2002, IPTPS.

[24]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[25]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[26]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[27]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[28]  Josh Cates,et al.  Robust and efficient data management for a distributed hash table , 2003 .

[29]  Stefan Savage,et al.  Understanding Availability , 2003, IPTPS.

[30]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[31]  Krishna P. Gummadi,et al.  The impact of DHT routing geometry on resilience and proximity , 2003, SIGCOMM '03.

[32]  Jon Crowcroft,et al.  Lighthouses for Scalable Distributed Location , 2003, IPTPS.

[33]  John Kubiatowicz,et al.  Structured Peer-to-Peer Overlays Need Application-Driven Benchmarks , 2003, IPTPS.

[34]  Indranil Gupta,et al.  Kelips: Building an Efficient and Stable P2P DHT through Increased Memory and Background Overhead , 2003, IPTPS.

[35]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[36]  Kirsten Hildrum,et al.  Optimizations for Locality-Aware Structured Peer-to-Peer Overlays , 2003 .

[37]  Anjali Gupta,et al.  One Hop Lookups for Peer-to-Peer Overlays , 2003, HotOS.

[38]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[39]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[40]  Maarten van Steen,et al.  An Epidemic Protocol for Managing Routing Tables in Very Large Peer-to-Peer Networks , 2003, DSOM.

[41]  Miguel Castro,et al.  PIC: practical Internet coordinates for distance estimation , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[42]  Yuval Shavitt,et al.  Big-bang simulation for embedding network distances in Euclidean space , 2004, IEEE/ACM Transactions on Networking.

[43]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[44]  Michael Walfish,et al.  Untangling the Web from DNS , 2004, NSDI.

[45]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[46]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[47]  Miguel Castro,et al.  Performance and dependability of structured peer-to-peer overlays , 2004, International Conference on Dependable Systems and Networks, 2004.

[48]  James Robertson,et al.  UsenetDHT: A Low Overhead Usenet Server , 2004, IPTPS.

[49]  Robert Tappan Morris,et al.  Practical, distributed network coordinates , 2004, Comput. Commun. Rev..

[50]  Hyuk Lim,et al.  Constructing Internet coordinate system based on delay measurement , 2003, IEEE/ACM Transactions on Networking.