Stroboscope: Declarative Network Monitoring on a Budget

For an Internet Service Provider (ISP), getting an accurate picture of how its network behaves is challenging. Indeed, given the carried traffic volume and the impossibility to control end-hosts, ISPs often have no other choice but to rely on heavily sampled traffic statistics, which provide them with coarse-grained visibility at a less than ideal time resolution (seconds or minutes). We present Stroboscope, a system that enables finegrained monitoring of any traffic flow by instructing routers to mirror millisecond-long traffic slices in a programmatic way. Stroboscope takes as input high-level monitoring queries together with a budget and automatically determines: (i) which flows to mirror; (ii) where to place mirroring rules, using fast and provably correct algorithms; and (iii) when to schedule these rules to maximize coverage while meeting the input budget. We implemented Stroboscope, and show that it scales well: it computes schedules for large networks and query sizes in few seconds, and produces a number of mirroring rules well within the limits of current routers. We also show that Stroboscope works on existing routers and is therefore immediately deployable.

[1]  Stefano Vissicchio,et al.  Beyond the Best: Real-Time Non-Invasive Collection of BGP Messages , 2010, INM/WREN.

[2]  Ratul Mahajan,et al.  A General Approach to Network Configuration Verification , 2017, SIGCOMM.

[3]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[4]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[5]  Roy Friedman,et al.  Constant Time Updates in Hierarchical Heavy Hitters , 2017, SIGCOMM.

[6]  Samuel T. King,et al.  Debugging the data plane with anteater , 2011, SIGCOMM 2011.

[7]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[8]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[9]  Rodrigo Fonseca,et al.  Planck , 2014, SIGCOMM.

[10]  György Dósa,et al.  The Tight Bound of First Fit Decreasing Bin-Packing Algorithm Is FFD(I) <= 11/9OPT(I) + 6/9 , 2007, ESCAPE.

[11]  Laurent Vanbever,et al.  Central Control Over Distributed Routing , 2015, Comput. Commun. Rev..

[12]  Xin Jin,et al.  SketchVisor: Robust Network Measurement for Software Packet Processing , 2017, SIGCOMM.

[13]  David Walker,et al.  Compiling Path Queries , 2016, NSDI.

[14]  Ratul Mahajan,et al.  Fast Control Plane Analysis Using an Abstract Representation , 2016, SIGCOMM.

[15]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[16]  Robert Raszuk,et al.  Dissemination of Flow Specification Rules , 2009, RFC.

[17]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[18]  Laurent Vanbever,et al.  Mille-Feuille: Putting ISP traffic under the scalpel , 2016, HotNets.

[19]  George Varghese,et al.  Checking Beliefs in Dynamic Networks , 2015, NSDI.

[20]  Costin Raiciu,et al.  SymNet: Scalable symbolic execution for modern networks , 2016, SIGCOMM.

[21]  Michael D. Ernst,et al.  Scalable verification of border gateway protocol configurations with an SMT solver , 2016, OOPSLA.

[22]  P. DeMar,et al.  EFFECT OF DYNAMIC ACL (ACCESS CONTROL LIST) LOADING ON PERFORMANCE OF CISCO ROUTERS , 2006 .

[23]  Ramesh Govindan,et al.  A General Approach to Network Configuration Analysis , 2015, NSDI.

[24]  George Varghese,et al.  Header Space Analysis: Static Checking for Networks , 2012, NSDI.

[25]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[26]  Rob Enns,et al.  NETCONF Configuration Protocol , 2006, RFC.

[27]  Brighten Godfrey,et al.  VeriFlow: verifying network-wide invariants in real time , 2012, HotSDN '12.

[28]  Anirudh Sivaraman,et al.  Language-Directed Hardware Design for Network Performance Monitoring , 2017, SIGCOMM.

[29]  Yifei Yuan,et al.  Quantitative Network Monitoring with NetQRE , 2017, SIGCOMM.

[30]  Anja Feldmann,et al.  OFRewind: Enabling Record and Replay Troubleshooting for Networks , 2011, USENIX Annual Technical Conference.

[31]  Chen-Nee Chuah,et al.  ProgME: Towards Programmable Network MEasurement , 2007, IEEE/ACM Transactions on Networking.

[32]  George Varghese,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 99 Real Time Network Policy Checking Using Header Space Analysis , 2022 .

[33]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[34]  Puneet Agarwal,et al.  Time To Live (TTL) Processing in Multi-Protocol Label Switching (MPLS) Networks , 2003, RFC.

[35]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[36]  Jianer Chen,et al.  An Improved Parameterized Algorithm for the Minimum Node Multiway Cut Problem , 2007, WADS.

[37]  Ramesh Govindan,et al.  Trumpet: Timely and Precise Triggers in Data Centers , 2016, SIGCOMM.

[38]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.