An optimization-based approach for efficient network monitoring using in-band network telemetry

In recent years, as a result of the proliferation of non-elastic services and the adoption of novel paradigms, monitoring networks with high level of detail is becoming crucial to correctly identify and characterize situations related to faults, performance, and security. In-band Network Telemetry (INT) emerges in this context as a promising approach to meet this demand, enabling production packets to directly report their experience inside a network. This type of telemetry enables unprecedented monitoring accuracy and precision, but leads to performance degradation if applied indiscriminately using all network traffic. One alternative to avoid this situation is to orchestrate telemetry tasks and use only a portion of traffic to monitor the network via INT. The general problem, in this context, consists in assigning subsets of traffic to carry out INT and provide full monitoring coverage while minimizing the overhead. In this paper, we introduce and formalize two variations of the In-band Network Telemetry Orchestration (INTO) problem, prove that both are NP-Complete, and propose polynomial computing time heuristics to solve them. In our evaluation using real WAN topologies, we observe that the heuristics produce solutions close to optimal to any network in under one second, networks can be covered assigning a linear number of flows in relation to the number of interfaces in them, and that it is possible to minimize telemetry load to one interface per flow in most networks.

[1]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[2]  Luciano Paschoal Gaspary,et al.  Data Plane Programmability Beyond OpenFlow: Opportunities and Challenges for Network and Service Operations and Management , 2017, Journal of Network and Systems Management.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Guang Cheng,et al.  Adaptive Sampling for OpenFlow Network Measurement Methods , 2017, CFI.

[5]  Pierre Schaus,et al.  REPETITA: Repeatable Experiments for Performance Evaluation of Traffic-Engineering Algorithms , 2017, ArXiv.

[6]  David Mazières,et al.  Millions of Little Minions: Using Packets for Low Latency Network Programming and Visibility (Extended Version) , 2014, SIGCOMM 2015.

[7]  Carlos Pignataro,et al.  Network Service Header (NSH) , 2018, RFC.

[8]  Jennifer Rexford,et al.  Catching the Microburst Culprits with Snappy , 2018, SelfDN@SIGCOMM.

[9]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[10]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[11]  Ariel Orda,et al.  dRMT: Disaggregated Programmable Switching , 2017, SIGCOMM.

[12]  George Pavlou,et al.  Decentralized monitoring for large-scale Software-Defined Networks , 2017, 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[13]  Rolf Stadler,et al.  H-GAP: estimating histograms of local variables with accuracy objectives for distributed real-time monitoring , 2010, IEEE Transactions on Network and Service Management.

[14]  Matthew Roughan,et al.  Simplifying the synthesis of internet traffic matrices , 2005, CCRV.

[15]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[16]  Laurent Vanbever,et al.  Stroboscope: Declarative Network Monitoring on a Budget , 2018, NSDI.

[17]  Ramesh Govindan,et al.  DREAM: dynamic resource allocation for software-defined measurement , 2015, SIGCOMM 2015.

[18]  S. Muthukrishnan,et al.  Heavy-Hitter Detection Entirely in the Data Plane , 2016, SOSR.

[19]  Fernando A. Kuipers,et al.  Fast network congestion detection and avoidance using P4 , 2018, NEAT@SIGCOMM.

[20]  Filip De Turck,et al.  Predicting the performance of virtual reality video streaming in mobile networks , 2018, MMSys.

[21]  Arvind Krishnamurthy,et al.  High-resolution measurement of data center microbursts , 2017, Internet Measurement Conference.

[22]  Rolf Stadler,et al.  A-GAP: An Adaptive Protocol for Continuous Network Monitoring with Accuracy Objectives , 2007, IEEE Transactions on Network and Service Management.

[23]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[24]  Myungjin Lee,et al.  Not all microseconds are equal: fine-grained per-flow measurements with reference latency interpolation , 2010, SIGCOMM '10.

[25]  Raouf Boutaba,et al.  PayLess: A low cost network monitoring framework for Software Defined Networks , 2014, 2014 IEEE Network Operations and Management Symposium (NOMS).

[26]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[27]  Nick McKeown,et al.  I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.

[28]  Walter Willinger,et al.  Sonata: query-driven streaming network telemetry , 2018, SIGCOMM.