Sampling vs sketching: An information theoretic comparison

The main approaches to high speed measurement in routers are traffic sampling, and sketching. However, it is not known which paradigm is inherently better at extracting information from traffic streams. We tackle this problem for the first time using Fisher information as a means of comparison, in the context of flow size distribution measurement. We first provide a side-by-side information theoretic comparison, and then with added resource constraints according to simple models of router implementations. Finally, we evaluate the performance of both methods on actual traffic traces.

[1]  Darryl Veitch,et al.  Fisher Information in Flow Size Distribution Estimation , 2011, IEEE Transactions on Information Theory.

[2]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2003, SIGCOMM '03.

[3]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[4]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[5]  Patrick Crowley,et al.  Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[6]  Donald F. Towsley,et al.  Fisher information of sampled packets: an application to flow size estimation , 2006, IMC '06.

[7]  Ashwin Lall,et al.  A data streaming algorithm for estimating entropies of od flows , 2007, IMC '07.

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Tunc Geveci,et al.  Advanced Calculus , 2014, Nature.

[10]  Abhishek Kumar,et al.  A data streaming algorithm for estimating subpopulation flow size distribution , 2005, SIGMETRICS '05.

[11]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[12]  Ashwin Lall,et al.  An Efficient Algorithm for Measuring Medium- to Large-Sized Flows in Network Traffic , 2009, IEEE INFOCOM 2009.

[13]  Ram Zamir,et al.  A Proof of the Fisher Information Inequality via a Data Processing Argument , 1998, IEEE Trans. Inf. Theory.

[14]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[15]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[16]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[17]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[18]  M. Grossglauser Trajectory Sampling for Direct Traac Observation , 2001 .

[19]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IMC '03.

[20]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[21]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[22]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[23]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[24]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[25]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[26]  Darryl Veitch,et al.  Towards optimal sampling for flow size estimation , 2008, IMC '08.

[27]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[28]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[29]  Donald F. Towsley,et al.  A resource-minimalist flow size histogram estimator , 2008, IMC '08.

[30]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Scott Mueller,et al.  Upgrading and Repairing PCs , 1995 .

[33]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[34]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.