Fisher Information in Flow Size Distribution Estimation

The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present dual sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS, and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet sampling-based competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of “flow quality” for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data.

[1]  M. Mitzenmacher,et al.  Probability and Computing: Chernoff Bounds , 2005 .

[2]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[3]  Ram Zamir,et al.  A Proof of the Fisher Information Inequality via a Data Processing Argument , 1998, IEEE Trans. Inf. Theory.

[4]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[5]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IMC '03.

[6]  Qi Zhao,et al.  Design of a novel statistics counter architecture with optimal space and time efficiency , 2006, SIGMETRICS '06/Performance '06.

[7]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2003, SIGCOMM '03.

[10]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[11]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[12]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[13]  Alfred O. Hero,et al.  Exploring estimator bias-variance tradeoffs using the uniform CR bound , 1996, IEEE Trans. Signal Process..

[14]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[15]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[16]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[17]  Charles A. Micchelli,et al.  Binomial Matrices , 2001, Adv. Comput. Math..

[18]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM 2004.

[19]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[20]  G. Michailidis,et al.  QRP06-6: Estimation of Flow Lengths from Sampled Traffic , 2006, IEEE Globecom 2006.

[21]  Donald F. Towsley,et al.  Fisher information of sampled packets: an application to flow size estimation , 2006, IMC '06.

[22]  Tunc Geveci,et al.  Advanced Calculus , 2014, Nature.

[23]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[24]  George Varghese,et al.  Network algorithmics , 2004 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Darryl Veitch,et al.  Towards optimal sampling for flow size estimation , 2008, IMC '08.

[27]  Devavrat Shah,et al.  Maintaining Statistics Counters in Router Line Cards , 2002, IEEE Micro.

[28]  Darryl Veitch,et al.  Fisher Information in Flow Size Distribution , 2011, ArXiv.

[29]  Fuzhen Zhang Matrix Theory: Basic Results and Techniques , 1999 .