Statistical inverse problems on graphs with application to flow volume estimation in computer networks

Estimation of flow volumes in computer networks involves the use of data that are either highly aggregated or fairly noisy. We address several conceptual and practical aspects of the use of such data for flow volume estimation in this work. The results presented are often of general statistical interest in addition to their application in computer networks context. First, we study the problem of identifiability of joint distribution of flow volumes in a computer network from aggregate (lower dimensional) measurements collected on its edges. Conceptually, this is a canonical example of a statistical inverse problem. In a significant departure from previous approaches we investigate settings where flow-volumes exhibit dependence. We introduce a number of models that capture spatial, temporal and inter-modal (i.e. between packets and bytes) dependence between flow-volumes. We provide sufficient, sometimes necessary, conditions for the identifiability of the flow volumes distribution (up to mean) under these models. Next we use these results and models to perform computer network tomography using joint modeling for packet curl byte volumes. We highlight various technical challenges, propose different estimating procedures and investigate their properties. Finally, we examine the problem of optimal design in the context of filtering multiple random walks. Specifically we define the steady state E-optimal design criterion and show that the underlying optimization problem is convex. The developed methodology is applied to tracking network flow volumes using sampled data, where the design variable corresponds to controlling the sampling rate.

[1]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[2]  D.G. Dudley,et al.  Dynamic system identification experiment design and data analysis , 1979, Proceedings of the IEEE.

[3]  Qi Zhao,et al.  Robust traffic matrix estimation with imperfect information: making use of multiple data sources , 2006, SIGMETRICS '06/Performance '06.

[4]  Luc Pronzato,et al.  Sequential Design and Active Control , 1998 .

[5]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[6]  Vijay Erramilli,et al.  An independent-connection model for traffic matrices , 2006, IMC '06.

[7]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[8]  Steve Uhlig,et al.  Providing public intradomain traffic matrices to the research community , 2006, CCRV.

[9]  Bin Yu,et al.  A fast lightweight approach to origin-destination IP traffic estimation using partial measurements , 2006, IEEE Transactions on Information Theory.

[10]  Donald F. Towsley,et al.  An information-theoretic approach to network monitoring and measurement , 2005, IMC '05.

[11]  Walter Willinger,et al.  Self-similarity and heavy tails: structural modeling of network traffic , 1998 .

[12]  Matthias Grossglauser,et al.  On the relevance of long-range dependence in network traffic , 1996, SIGCOMM '96.

[13]  Qi Zhao,et al.  Towards ideal network traffic measurement: a statistical algorithmic approach , 2007 .

[14]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[15]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[16]  C. D. Hardin,et al.  On the spectral representation of symmetric stable processes , 1982 .

[17]  Steven E. Rigdon,et al.  Model-Oriented Design of Experiments , 1997, Technometrics.

[18]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[19]  J. Ord,et al.  Characterization Problems in Mathematical Statistics , 1975 .

[20]  Bruce S. Davie,et al.  Computer Networks: A Systems Approach , 1996 .

[21]  D. M. Titterington,et al.  Recent advances in nonlinear experiment design , 1989 .

[22]  Bin Yu,et al.  Pseudo likelihood estimation in network tomography , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[23]  G. C. Tiao,et al.  An introduction to multiple time series analysis. , 1993, Medical care.

[24]  G. Michailidis,et al.  QRP06-6: Estimation of Flow Lengths from Sampled Traffic , 2006, IEEE Globecom 2006.

[25]  Carsten Lund,et al.  Optimal combination of sampled network measurements , 2005, IMC '05.

[26]  A. Arbel,et al.  Sensor placement in optimal filtering and smoothing problems , 1982 .

[27]  Kavé Salamatian,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM '02.

[28]  Baek-Young Choi,et al.  On the Accuracy and Overhead of Cisco Sampled NetFlow , 2005 .

[29]  H. Liu,et al.  Conference on Measurement and modeling of computer systems , 2001 .

[30]  C. Gouriéroux,et al.  PSEUDO MAXIMUM LIKELIHOOD METHODS: THEORY , 1984 .

[31]  D. M. Titterington,et al.  Aspects of Optimal Design in Dynamic Systems , 1980 .

[32]  B. Yu,et al.  Time-varying network tomography: router link data , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[33]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[34]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[35]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[36]  G. Michailidis,et al.  Identifiability of flow distributions from link measurements with applications to computer networks , 2007 .

[37]  Jin Cao,et al.  Network Tomography: Identifiability and Fourier Domain Estimation , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[38]  Emilio Leonardi,et al.  How to identify and estimate the largest traffic matrix elements in a dynamic environment , 2004, SIGMETRICS '04/Performance '04.

[39]  Rudolf Grübel,et al.  Decompounding: an estimation problem for Poisson random sums , 2003 .

[40]  Walter Willinger,et al.  Towards a meaningful MRA of traffic matrices , 2008, IMC '08.

[41]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[42]  柿沢 佳秀,et al.  Asymptotic theory of statistical inference for time series , 2000 .

[43]  Vijayan N. Nair,et al.  Network tomography: A review and recent developments , 2006 .

[44]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[45]  Konstantina Papagiannaki,et al.  Traffic matrices: balancing measurements, inference and modeling , 2005, SIGMETRICS '05.

[46]  Walter Willinger,et al.  Self-Similar Network Traffic and Performance Evaluation , 2000 .

[47]  J. Kyburg,et al.  Probability Theory , 1969 .

[48]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[49]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.