Scalable High Resolution Traffic Heatmaps: Coherent Queue Visualization for Datacenters

We propose a new high resolution – temporal and spatial – 10 Gbps Ethernet monitoring technique based on time-coherent congestion ‘heatmaps’, revealing (all) the queue occupancies at μs granularity. Notably, queues are sampled with a slightly modified version of the new commodity Ethernet hardware congestion management protocol, i.e., IEEE 802 Quantized Congestion Notification. Our technique is evaluated through high-accuracy Layer-2 simulations of a 10 Gbps datacenter Ethernet fabric. Early results reveal that our proposal enables the detection of ephemeral – yet consequential – events and transients essential for datacenter workload characterization: e.g., TCP Incast, Head-of-Line blocking and congestion trees, which may trigger within 10s of μs and were not directly detectable until now.

[1]  A. Flammini,et al.  Synchronization of the Probes of a Distributed Instrument for Real-Time Ethernet Networks , 2007, 2007 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control and Communication.

[2]  Kang Lee,et al.  IEEE 1588 standard for a precision clock synchronization protocol for networked measurement and control systems , 2002, 2nd ISA/IEEE Sensors for Industry Conference,.

[3]  David L. Mills,et al.  Internet Engineering Task Force (ietf) Network Time Protocol Version 4: Protocol and Algorithms Specification , 2010 .

[4]  Mihai Ivanovici,et al.  Operational Model of the ATLAS TDAQ Network , 2008 .

[5]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[6]  German Rodriguez,et al.  Trace-driven co-simulation of high-performance computing systems using OMNeT++ , 2009, SIMUTools 2009.

[7]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[8]  Cyriel Minkenberg,et al.  Trace-driven co-simulation of high-performance computing systems using OMNeT++ , 2009, SimuTools.

[9]  G. Robertson,et al.  Using visualization to support network and application management in a data center , 2008, 2008 IEEE Internet Network Management Workshop (INM).

[10]  Wu-chun Feng,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, IPDPS.

[11]  Torsten Hoefler,et al.  Fast pattern-specific routing for fat tree networks , 2013, ACM Trans. Archit. Code Optim..

[12]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1989, RFC.

[13]  Daniel Crisan,et al.  R3C2: Reactive Route and Rate Control for CEE , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[14]  Chuang Lin,et al.  Modeling and understanding TCP incast in data center networks , 2011, 2011 Proceedings IEEE INFOCOM.

[15]  Andreea Anghel,et al.  Short and Fat: TCP Performance in CEE Datacenter Networks , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[16]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[17]  Michael Jurczyk,et al.  Phenomenon of Higher Order Head-of-Line Blocking in Multistage Interconnection Networks under Nonuniform Traffic Patterns (Special Issue on Architectures, Algorithms and Networks for Massively Parallel Computing) , 1996 .

[18]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[19]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[20]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[21]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[22]  George Varghese,et al.  Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator , 2009, SIGCOMM '09.