Mille-Feuille: Putting ISP traffic under the scalpel

For Internet Service Provider (ISP) operators, getting an accurate picture of how their network behaves is challenging. Given the traffic volumes that their networks carry and the impossibility to control end-hosts, ISP operators are typically forced to randomly sample traffic, and rely on aggregated statistics. This provides coarse-grained visibility, at a time resolution that is far from ideal (seconds or minutes). In this paper, we present Mille-Feuille, a novel monitoring architecture that provides fine-grained visibility over ISP traffic. Mille-Feuille schedules activation and deactivation of traffic-mirroring rules, that are then provisioned network-wide from a central location, within milliseconds. By doing so, Mille-Feuille combines the scalability of sampling with the visibility and controllability of traffic mirroring. As a result, it supports a set of monitoring primitives, ranging from checking key performance indicators (e.g., one-way delay) for single destinations to estimating traffic matrices in sub-seconds. Our preliminary measurements on existing routers confirm that Mille-Feuille is viable in practice.

[1]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[2]  Anja Feldmann,et al.  OFRewind: Enabling Record and Replay Troubleshooting for Networks , 2011, USENIX Annual Technical Conference.

[3]  David Walker,et al.  Compiling Path Queries , 2016, NSDI.

[4]  Ramesh Govindan,et al.  Trumpet: Timely and Precise Triggers in Data Centers , 2016, SIGCOMM.

[5]  Ramesh Govindan,et al.  A General Approach to Network Configuration Analysis , 2015, NSDI.

[6]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[7]  Yin Zhang,et al.  BGP routing stability of popular destinations , 2002, IMW '02.

[8]  Laurent Vanbever,et al.  Central Control Over Distributed Routing , 2015, Comput. Commun. Rev..

[9]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[10]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[11]  Rodrigo Fonseca,et al.  Planck , 2014, SIGCOMM.

[12]  Konstantina Papagiannaki,et al.  Measurement and analysis of single-hop delay on an IP backbone network , 2003, IEEE J. Sel. Areas Commun..

[13]  Laurent Vanbever,et al.  Sweet Little Lies: Fake Topologies for Flexible Routing , 2014, HotNets.

[14]  Chen-Nee Chuah,et al.  Analysis of link failures in an IP backbone , 2002, IMW '02.

[15]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[16]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[17]  Nick McKeown,et al.  I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.

[18]  George Varghese,et al.  Real Time Network Policy Checking Using Header Space Analysis , 2013, NSDI.