Planck: millisecond-scale monitoring and control for commodity networks

Software-defined networking introduces the possibility of building self-tuning networks that constantly monitor network conditions and react rapidly to important events such as congestion. Unfortunately, state-of-the-art monitoring mechanisms for conventional networks require hundreds of milliseconds to seconds to extract global network state, like link utilization or the identity of "elephant" flows. Such latencies are adequate for responding to persistent issues, e.g., link failures or long-lasting congestion, but are inadequate for responding to transient problems, e.g., congestion induced by bursty workloads sharing a link. In this paper, we present Planck, a novel network measurement architecture that employs oversubscribed port mirroring to extract network information at 280 µs--7 ms timescales on a 1 Gbps commodity switch and 275 µs--4 ms timescales on a 10 Gbps commodity switch,over 11x and 18x faster than recent approaches, respectively (and up to 291x if switch firmware allowed buffering to be disabled on some ports). To demonstrate the value of Planck's speed and accuracy, we use it to drive a traffic engineering application that can reroute congested flows in milliseconds. On a 10 Gbps commodity switch, Planck-driven traffic engineering achieves aggregate throughput within 1--4% of optimal for most workloads we evaluated, even with flows as small as 50 MiB, an improvement of up to 53% over previous schemes.

[1]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[2]  EDDIE KOHLER,et al.  The click modular router , 2000, TOCS.

[3]  Alan L. Cox,et al.  PAST: scalable ethernet for data centers , 2012, CoNEXT '12.

[4]  Paramvir Bahl,et al.  Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM 2011.

[5]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[6]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[7]  Geoffrey M. Voelker,et al.  Bullet trains: a study of NIC burst behavior at microsecond timescales , 2013, CoNEXT.

[8]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM '10.

[9]  Chen Liang,et al.  Participatory networking: an API for application control of SDNs , 2013, SIGCOMM.

[10]  Amin Vahdat,et al.  Integrating microsecond circuit switching into the data center , 2013, SIGCOMM.

[11]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[12]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[13]  Ted Taekyoung Kwon,et al.  OpenSample: A Low-Latency, Sampling-Based Measurement Platform for Commodity SDN , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[14]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[15]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[16]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[17]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[18]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[19]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[20]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[21]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[22]  Emin Gün Sirer,et al.  SideCar: building programmable datacenter networks without programmable switches , 2010, Hotnets-IX.

[23]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[24]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[25]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[26]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[27]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM 2004.

[28]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[29]  Paul Congdon,et al.  Hey, you darned counters!: get off my ASIC! , 2012, HotSDN '12.

[30]  Randy Presuhn Version 2 of the Protocol Operations for the Simple Network Management Protocol (SNMP) , 2002, RFC.

[31]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.