Distributed Network Monitoring and Debugging with SwitchPointer

Monitoring and debugging large-scale networks remains a challenging problem. Existing solutions operate at one of the two extremes — systems running at end-hosts (more resources but less visibility into the network) or at network switches (more visibility, but limited resources). We present SwitchPointer, a network monitoring and debugging system that integrates the best of the two worlds. SwitchPointer exploits end-host resources and programmability to collect and monitor telemetry data. The key contribution of SwitchPointer is to efficiently provide network visibility by using switch memory as a “directory service” — each switch, rather than storing the data necessary for monitoring functionalities, stores pointers to end-hosts where relevant telemetry data is stored. We demonstrate, via experiments over real-world testbeds, that SwitchPointer can efficiently monitor and debug network problems, many of which were either hard or even infeasible with existing designs.

[1]  Myungjin Lee,et al.  Simplifying Datacenter Network Debugging with PathDump , 2016, OSDI.

[2]  Samuel T. King,et al.  Debugging the data plane with anteater , 2011, SIGCOMM 2011.

[3]  Nick McKeown,et al.  I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.

[4]  Xin Jin,et al.  SketchVisor: Robust Network Measurement for Software Packet Processing , 2017, SIGCOMM.

[5]  David Walker,et al.  Compiling Path Queries , 2016, NSDI.

[6]  Jake Silverman,et al.  Felix: Implementing Traffic Measurement on End Hosts Using Program Analysis , 2016, SOSR.

[7]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[8]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[9]  Myungjin Lee,et al.  CherryPick: tracing packet trajectory in software-defined datacenter networks , 2015, SOSR.

[10]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[11]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[12]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[13]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[14]  Anirudh Sivaraman,et al.  Language-Directed Hardware Design for Network Performance Monitoring , 2017, SIGCOMM.

[15]  Minlan Yu,et al.  Profiling Network Performance for Multi-tier Data Center Applications , 2011, NSDI.

[16]  Walter Willinger,et al.  Network Monitoring as a Streaming Analytics Problem , 2016, HotNets.

[17]  Edward A. Fox,et al.  A faster algorithm for constructing minimal perfect hash functions , 1992, SIGIR '92.

[18]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[19]  Ramesh Govindan,et al.  Trumpet: Timely and Precise Triggers in Data Centers , 2016, SIGCOMM.

[20]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[21]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[22]  David Walker,et al.  Frenetic: a network programming language , 2011, ICFP.