Leave Them Microseconds Alone: Scalable Architecture for Maintaining Packet Latency Measurements

Latency has become an important metric for network monitoring since the emergence of new latency-sensitive applications (e.g., algorithmic trading and high-performance computing). To satisfy the need, researchers have proposed new architectures such as LDA and RLI that can provide finegrained latency measurements. However, these architectures are fundamentally ossified in their design as they are designed to provide only a specific pre-configured aggregate measurement—either average latency across all packets (LDA) or per-flow latency measurements (RLI). Network operators, however, need latency measurements at both finer (e.g., packet) as well as flexible (e.g., flow subsets) levels of granularity. To bridge this gap, we propose an architecture called MAPLE that essentially stores packet-level latencies in routers and allows network operators to query the latency of arbitrary traffic sub-populations. MAPLE is built using scalable data structures with small storage needs (uses only 12.8 bits/pkt), and uses optimizations such as range queries to reduce the query bandwidth significantly (by a factor of 10 compared to the naive).

[1]  Adam Meyerson,et al.  Online facility location , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[2]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[3]  Albert G. Greenberg,et al.  Seawall: Performance Isolation for Cloud Datacenter Networks , 2010, HotCloud.

[4]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[8]  Yao Zhao,et al.  Towards Unbiased End-to-End Network Diagnosis , 2006, IEEE/ACM Transactions on Networking.

[9]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[10]  Satoru Miyano,et al.  The C Clustering Library , 2005 .

[11]  George Varghese,et al.  Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator , 2009, SIGCOMM '09.

[12]  Kang Li,et al.  Approximate caches for packet classification , 2004, IEEE INFOCOM 2004.

[13]  Aditya Akella,et al.  NetReplay: a new network primitive , 2010, PERV.

[14]  Myungjin Lee,et al.  Two Samples are Enough: Opportunistic Flow-level Latency Estimation using NetFlow , 2010, 2010 Proceedings IEEE INFOCOM.

[15]  Randy H. Katz,et al.  An algebraic approach to practical and scalable overlay network monitoring , 2004, SIGCOMM '04.

[16]  Kang Lee,et al.  IEEE 1588 standard for a precision clock synchronization protocol for networked measurement and control systems , 2002, 2nd ISA/IEEE Sensors for Industry Conference,.

[17]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[18]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[19]  Craig Partridge,et al.  Single-packet IP traceback , 2002, TNET.

[20]  Myungjin Lee,et al.  Not all microseconds are equal: fine-grained per-flow measurements with reference latency interpolation , 2010, SIGCOMM '10.

[21]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[22]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[23]  Nick Feamster,et al.  Network Troubleshooting : An In-band Approach , 2007 .

[24]  Fang Hao,et al.  Fast Multiset Membership Testing Using Combinatorial Bloom Filters , 2009, IEEE INFOCOM 2009.

[25]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[26]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[27]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[28]  Nick G. Duffield,et al.  Simple network performance tomography , 2003, IMC '03.

[29]  M. Leeser,et al.  Design tradeoffs in a hardware implementation of the k-means clustering algorithm , 2000, Proceedings of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop. SAM 2000 (Cat. No.00EX410).

[30]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.