Task Scheduling for Probabilistic In -Band Network Telemetry

In-band Network Telemetry (INT) is a novel framework for monitoring network health in real-time, and its recent variant, Probabilistic INT (PINT), reduces its bandwidth consumption with a probabilistic approach. However, as we show in this paper, a PINT task can be successfully accomplished only when it is allocated a sufficient number of packets, and if there are many tasks executed in parallel, packets become a scarce resource. Meanwhile, today’s production network generally executes multiple measurement tasks for tracing different network states simultaneously. Therefore, in such a context, scheduling parallel PINT tasks on one single INT flow that has a limited number of packets becomes a critical problem. In this paper, we address this problem for the first time. We propose an algorithm that efficiently schedules multiple parallel PINT tasks on a flow by allocating the flow’s packets to the tasks and showing that the allocation is optimal. We realize the algorithm with a packet processing pipeline and implement it on software and hardware-programmable switches. Comprehensive evaluation on a FatTree testbed shows that at a low scheduling overhead, our algorithm can conduct parallel PINT tasks to detect various network faults in a timely and accurate manner. Additionally, the algorithm accomplishes more PINT tasks with higher quality than the alternative solutions.

[1]  Zirui Liu,et al.  SketchINT: Empowering INT with TowerSketch for Per-flow Per-switch Measurement , 2021, 2021 IEEE 29th International Conference on Network Protocols (ICNP).

[2]  Patrick P. C. Lee,et al.  DeltaINT: Toward General In-band Network Telemetry with Extremely Low Bandwidth Overhead , 2021, 2021 IEEE 29th International Conference on Network Protocols (ICNP).

[3]  Tong Yang,et al.  CocoSketch: High-Performance Sketch-Based Measurement Over Arbitrary Partial Key Query , 2021, IEEE/ACM Transactions on Networking.

[4]  Tian Pan,et al.  INT-label: Lightweight In-band Network-Wide Telemetry via Interval-based Distributed Labelling , 2021, IEEE INFOCOM 2021 - IEEE Conference on Computer Communications.

[5]  Na Li,et al.  In-band Network Telemetry: A Survey , 2021, Comput. Networks.

[6]  Deke Guo,et al.  A survey of sketches in traffic measurement: Design, Optimization, Application and Implementation , 2020, 2012.07214.

[7]  Minlan Yu,et al.  Detecting routing loops in the data plane , 2020, CoNEXT.

[8]  Wei Bai,et al.  OmniMon: Re-architecting Network Telemetry with Resource Efficiency and Full Accuracy , 2020, SIGCOMM.

[9]  Minlan Yu,et al.  PINT: Probabilistic In-band Network Telemetry , 2020, SIGCOMM.

[10]  Jeroen Hoebeke,et al.  In-Band Network Telemetry in Industrial Wireless Sensor Networks , 2020, IEEE Transactions on Network and Service Management.

[11]  You Zhou,et al.  Generalized Sketch Families for Network Traffic Measurement , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[12]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[13]  Zuqing Zhu,et al.  Visualize Your IP-Over-Optical Network in Realtime: A P4-Based Flexible Multilayer In-Band Network Telemetry (ML-INT) System , 2019, IEEE Access.

[14]  Sangheon Pack,et al.  Selective In-band Network Telemetry for Overhead Reduction , 2018, 2018 IEEE 7th International Conference on Cloud Networking (CloudNet).

[15]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[16]  Myungjin Lee,et al.  Distributed Network Monitoring and Debugging with SwitchPointer , 2018, NSDI.

[17]  Ariel Orda,et al.  dRMT: Disaggregated Programmable Switching , 2017, SIGCOMM.

[18]  Myungjin Lee,et al.  Simplifying Datacenter Network Debugging with PathDump , 2016, OSDI.

[19]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[20]  David Walker,et al.  Compiling Path Queries , 2016, NSDI.

[21]  Ramesh Govindan,et al.  SCREAM: sketch resource allocation for software-defined measurement , 2015, CoNEXT.

[22]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[23]  Amin Vahdat,et al.  DREAM: dynamic resource allocation for software-defined measurement , 2015, SIGCOMM.

[24]  Nick McKeown,et al.  I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.

[25]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[26]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[27]  J. Goodier The Concise Encyclopedia of Statistics , 2009 .

[28]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[29]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[30]  Yi Wang,et al.  LightGuardian: A Full-Visibility, Lightweight, In-band Telemetry System Using Sketchlets , 2021, NSDI.

[31]  Adam J. Aviv,et al.  Scaling Hardware Accelerated Network Monitoring to Concurrent and Dynamic Queries With *Flow , 2018, USENIX ATC.

[32]  E. Stein,et al.  Real Analysis: Measure Theory, Integration, and Hilbert Spaces , 2005 .

[33]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.