Blind source separation approach to performance diagnosis and dependency discovery

We consider the problem of diagnosing performance problems in distributed system and networks given end-to-end performance measurements provided by test transactions, or probes. Common techniques for problem diagnosis such as, for example, codebook and network tomography usually assume a known dependency (e.g., routing) matrix that describes how each probe depends on the systems components. However, collecting full information about routing and/or probe dependencies on all systems components can be very costly, if not impossible, in large-scale, dynamic networks and distributed systems. We propose an approach to problem diagnosis and dependency discovery from end-to-end performance measurements in cases when the dependency/routing information is unknown or partially known. Our method is based on Blind Source Separation (BSS) approach that aims at reconstructing unobserved input signals and the mixing-weights matrix from the observed mixtures of signals. Particularly, we apply sparse non-negative matrix factorization techniques that appear particularly fitted to the problem of recovering network bottlenecks and dependency (routing) matrix, and show promising experimental results on several realistic network topologies.

[1]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[2]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[3]  Donald F. Towsley,et al.  Multicast-based inference of network-internal delay distributions , 2002, TNET.

[4]  Carsten Lund,et al.  An information-theoretic approach to traffic matrix estimation , 2003, SIGCOMM '03.

[5]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Christophe Diot,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM 2002.

[7]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[8]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[9]  Sugih Jamin,et al.  Inet-3.0: Internet Topology Generator , 2002 .

[10]  Sheng Ma,et al.  Adaptive diagnosis in distributed systems , 2005, IEEE Transactions on Neural Networks.

[11]  Hong Shen,et al.  Multicast-based inference for topology and network-internal loss performance from end-to-end measurements , 2006, Comput. Commun..

[12]  Yin Zhang,et al.  NetQuest: A Flexible Framework for Large-Scale Network Measurement , 2009, IEEE/ACM Transactions on Networking.

[13]  Robert D. Nowak,et al.  Inferring Network Structure from Co-Occurrences , 2006, NIPS.

[14]  Robert D. Nowak,et al.  Network delay tomography , 2003, IEEE Trans. Signal Process..

[15]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.