Silo: Predictable Message Latency in the Cloud

Many cloud applications can benefit from guaranteed latency for their network messages, however providing such predictability is hard, especially in multi-tenant datacenters. We identify three key requirements for such predictability: guaranteed network bandwidth, guaranteed packet delay and guaranteed burst allowance. We present Silo, a system that offers these guarantees in multi-tenant datacenters. Silo leverages the tight coupling between bandwidth and delay: controlling tenant bandwidth leads to deterministic bounds on network queuing delay. Silo builds upon network calculus to place tenant VMs with competing requirements such that they can coexist. A novel hypervisor-based policing mechanism achieves packet pacing at sub-microsecond granularity, ensuring tenants do not exceed their allowances. We have implemented a Silo prototype comprising a VM placement manager and a Windows filter driver. Silo does not require any changes to applications, guest OSes or network switches. We show that Silo can ensure predictable message latency for cloud applications while imposing low overhead.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Scott Shenker,et al.  Recursively Cautious Congestion Control , 2014, NSDI.

[3]  I. Stoica,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[4]  Dinesh C. Verma,et al.  A Scheme for Real-Time Channel Establishment in Wide-Area Networks , 1990, IEEE J. Sel. Areas Commun..

[5]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[6]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Di Xie,et al.  The only constant is change: incorporating time-varying network reservations in data centers , 2012, CCRV.

[9]  Scott Shenker,et al.  Supporting real-time applications in an Integrated Services Packet Network: architecture and mechanism , 1992, SIGCOMM '92.

[10]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[11]  Brian D. Noble,et al.  Small is better: avoiding latency traps in virtualized data centers , 2013, SoCC.

[12]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[13]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[14]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[15]  Ramana Rao Kompella,et al.  Inferring the Network Latency Requirements of Cloud Tenants , 2015, HotOS.

[16]  Jean-Yves Le Boudec,et al.  Network Calculus: A Theory of Deterministic Queuing Systems for the Internet , 2001 .

[17]  S.E. Minzer,et al.  Broadband ISDN and asynchronous transfer mode (ATM) , 1989, IEEE Communications Magazine.

[18]  Peter Druschel,et al.  Soft timers: efficient microsecond software timer support for network processing , 1999, SOSP.

[19]  Rene L. Cruz,et al.  A calculus for network delay, Part I: Network elements in isolation , 1991, IEEE Trans. Inf. Theory.

[20]  Albert G. Greenberg,et al.  A flexible model for resource management in virtual private networks , 1999, SIGCOMM '99.

[21]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the multiple node case , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[22]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[23]  Amin Vahdat,et al.  SENIC: Scalable NIC for End-Host Rate Limiting , 2014, NSDI.

[24]  S. Jamaloddin Golestani A Stop-and-Go Queueing Framework for Congestion Management , 1990, SIGCOMM.

[25]  Dinan Gunawardena,et al.  Chatty Tenants and the Cloud Network Sharing Problem , 2013, NSDI.

[26]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[27]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[28]  Gregory G. Finn Reliable Asynchronous Transfer Protocol (RATP) , 1984, RFC.

[29]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[30]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[31]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[32]  Hari Balakrishnan,et al.  Cicada: Introducing Predictive Guarantees for Cloud Networks , 2014, HotCloud.

[33]  Srinivasan Keshav,et al.  Comparison of rate-based service disciplines , 1991, SIGCOMM '91.

[34]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[35]  Rene L. Cruz,et al.  A calculus for network delay, Part II: Network analysis , 1991, IEEE Trans. Inf. Theory.

[36]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[37]  James F. Kurose,et al.  On computing per-session performance bounds in high-speed multi-hop computer networks , 1992, SIGMETRICS '92/PERFORMANCE '92.

[38]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[39]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[40]  Hitesh Ballani,et al.  Towards predictable datacenter networks , 2011, SIGCOMM 2011.

[41]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[42]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[43]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[44]  Cong Xu,et al.  vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing , 2012, HPDC '12.

[45]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[46]  Srinivasan Keshav,et al.  Rate controlled servers for very high-speed networks , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[47]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[48]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[49]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[50]  Rina Panigrahy,et al.  Validating Heuristics for Virtual Machines Consolidation , 2011 .

[51]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[52]  Peter A. Dinda,et al.  VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[53]  Scott Shenker,et al.  Shark: fast data analysis using coarse-grained distributed memory , 2012, SIGMOD Conference.

[54]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[55]  Amin Vahdat,et al.  Practical TDMA for datacenter ethernet , 2012, EuroSys '12.

[56]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.