DeltaINT: Toward General In-band Network Telemetry with Extremely Low Bandwidth Overhead

In-band network telemetry (INT) enriches network management at scale through the embedding of complete device-internal states into each packet along its forwarding path, yet such embedding of INT information also incurs significant band-width overhead in the data plane. We propose DeltaINT, a general INT framework that achieves extremely low bandwidth overhead and supports various packet-level and flow-level applications in network management. DeltaINT builds on the insight that state changes are often negligible at most time, so it embeds a state into a packet only when the state change is deemed significant. We theoretically derive the time/space complexities and the bounds of bandwidth mitigation for DeltaINT. We implement DeltaINT in both software and P4. Our evaluation shows that DeltaINT reduces up to 93% of INT bandwidth, and its deployment in a Barefoot Tofino switch incurs limited hardware resource usage.

[1]  Sangheon Pack,et al.  Flexible sampling-based in-band network telemetry in programmable data plane , 2020, ICT Express.

[2]  Anirudh Sivaraman,et al.  In-band Network Telemetry via Programmable Dataplanes , 2015 .

[3]  Nick McKeown,et al.  I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.

[4]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[5]  Peng Huang,et al.  Gray Failure: The Achilles' Heel of Cloud-Scale Systems , 2017, HotOS.

[6]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[7]  Deval Bhamare,et al.  Programmable Event Detection for In-Band Network Telemetry , 2019, 2019 IEEE 8th International Conference on Cloud Networking (CloudNet).

[8]  George Varghese,et al.  Programming Protocol-Independent Packet Processors , 2013, ArXiv.

[9]  Cheng Xu,et al.  Rapid Detection and Localization of Gray Failures in Data Centers via In-band Network Telemetry , 2020, NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium.

[10]  Edo Liberty,et al.  Optimal Quantile Approximation in Streams , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[11]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[12]  Arvind Krishnamurthy,et al.  High-resolution measurement of data center microbursts , 2017, Internet Measurement Conference.

[13]  Bin Liu,et al.  INT-path: Towards Optimal Path Planning for In-band Network-Wide Telemetry , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[14]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[15]  Minlan Yu,et al.  PINT: Probabilistic In-band Network Telemetry , 2020, SIGCOMM.

[16]  Zuqing Zhu,et al.  Sel-INT: A Runtime-Programmable Selective In-Band Network Telemetry System , 2020, IEEE Transactions on Network and Service Management.

[17]  Myungjin Lee,et al.  Simplifying Datacenter Network Debugging with PathDump , 2016, OSDI.

[18]  Patrick P. C. Lee,et al.  Sketchlearn: relieving user burdens in approximate measurement with automated statistical inference , 2018, SIGCOMM.

[19]  Vladimir Braverman,et al.  QPipe: quantiles sketch fully in the data plane , 2019, CoNEXT.

[20]  Wei Bai,et al.  OmniMon: Re-architecting Network Telemetry with Resource Efficiency and Full Accuracy , 2020, SIGCOMM.

[21]  Sangheon Pack,et al.  Selective In-band Network Telemetry for Overhead Reduction , 2018, 2018 IEEE 7th International Conference on Cloud Networking (CloudNet).

[22]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[23]  Mun Choon Chan,et al.  BurstRadar: Practical Real-time Microburst Monitoring for Datacenter Networks , 2018, APSys.

[24]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[25]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[26]  Zhi-Li Zhang,et al.  Quantile sampling for practical delay monitoring in Internet backbone networks , 2007, Comput. Networks.

[27]  Yi Wang,et al.  LightGuardian: A Full-Visibility, Lightweight, In-band Telemetry System Using Sketchlets , 2021, NSDI.