Semi-supervised dictionary learning for network-wide link load prediction

Being a primary indicator of network health, link traffic volumes are used in multiple network management and diagnostic tasks. Although link volumes are available using off-the-shelf tools, the corresponding measurement records typically contain errors and missing data. To overcome these challenges, the present paper develops a link traffic prediction algorithm that fills missing entries and removes noise from the observed entries in an online fashion. The algorithm not only exploits topological knowledge of the network, but also learns from the available historical link traffic data. During its operational phase, the novel algorithm relies on a sparse signal representation for the link counts over a data-driven dictionary. Prediction of link counts follows after solving an ℓ1-regularized least-squares problem. Prior to operation however, a dictionary is trained so that it captures all the necessary information from the historical data, allows for a sparse representation, and is aware of the network topology. This is accomplished through a novel semi-supervised dictionary learning scheme which works even when the training data has missing entries. Numerical tests on data from the Internet2 archive corroborate the proposed algorithms.

[1]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[2]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[3]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[4]  Konstantina Papagiannaki,et al.  Traffic matrices: balancing measurements, inference and modeling , 2005, SIGMETRICS '05.

[5]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[6]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[7]  Matthew Roughan A Case Study of the Accuracy of SNMP Measurements , 2010, J. Electr. Comput. Eng..

[8]  Mark A. Miller Managing Internetworks with SNMP , 1997 .

[9]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[10]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming 2 , 2005 .

[12]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[13]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[14]  Carsten Lund,et al.  Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach , 2005, IEEE/ACM Transactions on Networking.

[15]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[16]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[19]  D. Miller Assessing the value of different asset management strategies , 2006 .