Estimating Latent Processes on a Network From Indirect Measurements

In a communication network, point-to-point traffic volumes over time are critical for designing protocols that route information efficiently and for maintaining security, whether at the scale of an Internet service provider or within a corporation. While technically feasible, the direct measurement of point-to-point traffic imposes a heavy burden on network performance and is typically not implemented. Instead, indirect aggregate traffic volumes are routinely collected. We consider the problem of estimating point-to-point traffic volumes, , from aggregate traffic volumes, , given information about the network routing protocol encoded in a matrix A. This estimation task can be reformulated as finding the solutions to a sequence of ill-posed linear inverse problems, , since the number of origin-destination routes of interest is higher than the number of aggregate measurements available. Here, we introduce a novel multilevel state-space model (SSM) of aggregate traffic volumes with realistic features. We implement a naïve strategy for estimating unobserved point-to-point traffic volumes from indirect measurements of aggregate traffic, based on particle filtering. We then develop a more efficient two-stage inference strategy that relies on model-based regularization: a simple model is used to calibrate regularization parameters that lead to efficient/scalable inference in the multilevel SSM. We apply our methods to corporate and academic networks, where we show that the proposed inference strategy outperforms existing approaches and scales to larger networks. We also design a simulation study to explore the factors that influence the performance. Our results suggest that model-based regularization may be an efficient strategy for inference in other complex multilevel models. Supplementary materials for this article are available online.

[1]  M. Bell THE ESTIMATION OF ORIGIN-DESTINATION MATRICES BY CONSTRAINED GENERALISED LEAST SQUARES , 1991 .

[2]  Yang Li,et al.  On Delay Tomography: Fast Algorithms and Spatially Dependent Models , 2012, IEEE Transactions on Signal Processing.

[3]  Vijay Erramilli,et al.  An independent-connection model for traffic matrices , 2006, IMC '06.

[4]  Edoardo M. Airoldi,et al.  Polytope samplers for inference in ill-posed inverse problems , 2011, AISTATS.

[5]  Carsten Lund,et al.  An information-theoretic approach to traffic matrix estimation , 2003, SIGCOMM '03.

[6]  Kavé Salamatian,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM '02.

[7]  A. Meister Deconvolution Problems in Nonparametric Statistics , 2009 .

[8]  Konstantina Papagiannaki,et al.  Traffic matrices: balancing measurements, inference and modeling , 2005, SIGMETRICS '05.

[9]  George Michailidis,et al.  Optimal experiment design in a filtering context with application to sampled network data , 2010, 1010.1126.

[10]  S. Fienberg An Iterative Procedure for Estimation in Contingency Tables , 1970 .

[11]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[12]  James J. Iannone,et al.  An EM Approach to OD Matrix Estimation , 1998 .

[13]  G. Michailidis,et al.  Network delay tomography using flexicast experiments , 2006 .

[14]  Per Christian Hansen,et al.  Rank-Deficient and Discrete Ill-Posed Problems , 1996 .

[15]  Donald B. Rubin,et al.  Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression , 1991 .

[16]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.

[17]  Robert Nowak,et al.  Internet tomography , 2002, IEEE Signal Process. Mag..

[18]  Jun S. Liu,et al.  Blind Deconvolution via Sequential Imputations , 1995 .

[19]  Bin Yu,et al.  Maximum pseudo likelihood estimation in network tomography , 2003, IEEE Trans. Signal Process..

[20]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[21]  J. Kruskal,et al.  COMPUTERIZED TOMOGRAPHY: THE NEW MEDICAL X-RAY TECHNOLOGY , 1978 .

[22]  Lucas C. Parra,et al.  Blind Source Separation via Generalized Eigenvalue Decomposition , 2003, J. Mach. Learn. Res..

[23]  Edoardo M. Airoldi,et al.  Deconvolution of mixing time series on a graph , 2011, UAI.

[24]  Walter Willinger,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009, SIGCOMM '09.

[25]  Matthew T. Harrison,et al.  Exact Enumeration and Sampling of Matrices with Specified Margins , 2011, ArXiv.

[26]  Bin Yu,et al.  A fast lightweight approach to origin-destination IP traffic estimation using partial measurements , 2006, IEEE Transactions on Information Theory.

[27]  Cun-Hui Zhang,et al.  An iterative tomogravity algorithm for the estimation of network traffic , 2007 .

[28]  Robert L. Smith,et al.  Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[29]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[30]  G. Michailidis,et al.  Identifiability of flow distributions from link measurements with applications to computer networks , 2007 .

[31]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[32]  B. Yu,et al.  Time-varying network tomography: router link data , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[33]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Yuguo Chen,et al.  Sequential Monte Carlo Methods for Statistical Analysis of Tables , 2005 .

[36]  Jin Cao,et al.  The Effect of Statistical Multiplexing on the Long-Range Dependence of Internet Packet Traffic , 2001 .

[37]  Vijayan N. Nair,et al.  Network tomography: A review and recent developments , 2006 .

[38]  M. West,et al.  Data augmentation in multi-way contingency tables with fixed marginal totals , 2006 .

[39]  P. Hansen Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion , 1987 .

[40]  Michael A. West,et al.  Bayesian Inference on Network Traffic Using Link Count Data , 1998 .

[41]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[42]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[43]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[44]  E. Airoldi Advances in Network Tomography , 2003 .

[45]  E. Oja,et al.  Independent Component Analysis , 2013 .

[46]  Matthew T. Harrison A Dynamic Programming Approach for Approximate Uniform Generation of Binary Matrices with Specified Margins , 2009 .

[47]  Jin Cao,et al.  A Scalable Method for Estimating Network Traffic Matrices from Link Counts , 2007 .

[48]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[49]  Edoardo M. Airoldi,et al.  Recovering latent time-series from their observed sums: network tomography with particle filters. , 2004, KDD '04.

[50]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[51]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[52]  Adrian Dobra,et al.  Dynamic Markov Bases , 2011, 1103.4891.

[53]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[54]  Donald F. Towsley,et al.  Multicast-based inference of network-internal delay distributions , 2002, TNET.

[55]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .