Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale

Efficient and secure management of networks requires collecting and analyzing fine-grained telemetry data, preferably in real-time. Existing monitoring and analysis frameworks (e.g., Netflow, SNMP counters) do not provide fine-grained, per-packet information, are hard or not possible to customize, and do not provide an expressive programming interface to extract information. We present ESnet High Touch Services, a programmable, scalable, and expressive hardware and software solution that produces and analyzes per-packet telemetry information with nanosecond-accurate timing. We highlight our architecture, the most critical performance considerations that allow the processing of 10.4 million telemetry packets per second with only 5 CPU cores, which is more than enough to handle 127 Gbit/s of original traffic with 1512B MTU. We also present applications of the system that use real-time stream processing with elegant filtering, aggregation, and windowing functionalities. Our use-cases show that High Touch Services can support a variety of advanced performance monitoring, troubleshooting, and security tasks.

[1]  Walter Willinger,et al.  Sonata: query-driven streaming network telemetry , 2018, SIGCOMM.

[2]  Adam J. Aviv,et al.  Scaling Hardware Accelerated Network Monitoring to Concurrent and Dynamic Queries With *Flow , 2018, USENIX Annual Technical Conference.

[3]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[4]  Donald F. Towsley,et al.  Locating network monitors: complexity, heuristics, and coverage , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[5]  Deval Bhamare,et al.  Programmable Event Detection for In-Band Network Telemetry , 2019, 2019 IEEE 8th International Conference on Cloud Networking (CloudNet).

[6]  Anirudh Sivaraman,et al.  Language-Directed Hardware Design for Network Performance Monitoring , 2017, SIGCOMM.

[7]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[8]  Dimitrios P. Pezaros,et al.  Ruru: High-speed, Flow-level Latency Measurement and Visualization of Live Internet Traffic , 2017, SIGCOMM Posters and Demos.