Router Support for Fine-Grained Latency Measurements

An increasing number of datacenter network applications, including automated trading and high-performance computing, have stringent end-to-end latency requirements where even microsecond variations may be intolerable. The resulting fine-grained measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing. We propose instrumenting routers with a hash-based primitive that we call a Lossy Difference Aggregator (LDA) to measure latencies down to tens of microseconds even in the presence of packet loss. Because LDA does not modify or encapsulate the packet, it can be deployed incrementally without changes along the forwarding path. When compared to Poisson-spaced active probing with similar overheads, our LDA mechanism delivers orders of magnitude smaller relative error; active probing requires 50-60 times as much bandwidth to deliver similar levels of accuracy. Although ubiquitous deployment is ultimately desired, it may be hard to achieve in the shorter term; we discuss a partial deployment architecture called mPlane using LDAs for intrarouter measurements and localized segment measurements for interrouter measurements.

[1]  Ratul Mahajan,et al.  Measuring ISP topologies with Rocketfuel , 2004, IEEE/ACM Transactions on Networking.

[2]  Albert G. Greenberg,et al.  Detection and Localization of Network Black Holes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[3]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[4]  Stefan Savage,et al.  Sting: A TCP-based Network Measurement Tool , 1999, USENIX Symposium on Internet Technologies and Systems.

[5]  Nick G. Duffield,et al.  Trajectory engine: a backend for trajectory sampling , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[8]  Kang B. Lee,et al.  Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems , 2004 .

[9]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[10]  Yao Zhao,et al.  Towards Unbiased End-to-End Network Diagnosis , 2006, IEEE/ACM Transactions on Networking.

[11]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[12]  M. Saxena,et al.  On the inadequacy of link connectivity monitoring , 2008, IEEE INFOCOM Workshops 2008.

[13]  Konstantina Papagiannaki,et al.  Bridging router performance and queuing theory , 2004, SIGMETRICS '04/Performance '04.

[14]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[15]  J. Sommers,et al.  A Geometric Approach to Improving Active Packet Loss Measurement , 2008, IEEE/ACM Transactions on Networking.

[16]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[17]  Georg Carle,et al.  Evaluation of building blocks for passive one-way-delay measurements , 2001 .

[18]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[19]  Vern Paxson,et al.  An architecture for large-scale Internet measurement , 1998, IEEE Commun. Mag..

[20]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[21]  Nick G. Duffield,et al.  Simple network performance tomography , 2003, IMC '03.

[22]  Konstantina Papagiannaki,et al.  Measurement and analysis of single-hop delay on an IP backbone network , 2003, IEEE J. Sel. Areas Commun..

[23]  Albert G. Greenberg,et al.  OSPF Monitoring: Architecture, Design, and Deployment Experience , 2004, NSDI.

[24]  Abhishek Kumar,et al.  A data streaming algorithm for estimating subpopulation flow size distribution , 2005, SIGMETRICS '05.

[25]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[26]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.

[27]  Vishal Misray,et al.  Stochastic Differential Equation Modeling and Analysis of TCP-Windowsize Behavior , 2005 .

[28]  Mark Claypool,et al.  The effects of loss and latency on user performance in unreal tournament 2003® , 2004, NetGames '04.

[29]  Patrick Thiran,et al.  Network loss inference with second order statistics of end-to-end flows , 2007, IMC '07.

[30]  Paul Barford,et al.  Improving accuracy in end-to-end packet loss measurement , 2005, SIGCOMM '05.

[31]  Darryl Veitch,et al.  A measurement-friendly network (MFN) architecture , 2006, INM '06.

[32]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.

[33]  Randy H. Katz,et al.  An algebraic approach to practical and scalable overlay network monitoring , 2004, SIGCOMM '04.