Detecting a network failure

Measuring the properties of a large, unstructured network can be difficult: one may not have full knowledge of the network topology, and detailed global measurements may be infeasible. A valuable approach to such problems is to take measurements from selected locations within the network and then aggregate them to infer large-scale properties. One sees this notion applied in settings that range from Internet topology discovery tools to remote software agents that estimate the download times of popular Web pages. Some of the most basic questions about this type of approach, however, are largely unresolved at an analytical level. How reliable are the results? How much does the choice of measurement locations affect the aggregate information one infers about the network? We describe algorithms that yield provable guarantees for a particular problem of this type: detecting a network failure. Suppose we want to detect events of the following form: an adversary destroys up to k nodes or edges, after which two subsets of the nodes, each at least an /spl epsi/ fraction of the network, are disconnected from one another. We call such an event an (/spl epsi/,k) partition. One method for detecting such events would be to place "agents" at a set D of nodes, and record a fault whenever two of them become separated from each other. To be a good detection set, D should become disconnected whenever there is an (/spl epsi/,k)-partition; in this way, it "witnesses" all such events. We show that every graph has a detection set of size polynomial in k and /spl epsi//sup -1/, and independent of the size of the graph itself. Moreover, random sampling provides an effective way to construct such a set. Our analysis establishes a connection between graph separators and the notion of VC-dimension, using techniques based on matchings and disjoint paths.

[1]  Alexander Schrijver,et al.  A Short Proof of Mader's sigma-Paths Theorem , 2001, J. Comb. Theory, Ser. B.

[2]  David R. Karger,et al.  A randomized fully polynomial time approximation scheme for the all terminal network reliability problem , 1995, STOC '95.

[3]  David Shallcross,et al.  Distance Realization Problems with Applications to Internet Tomography , 2001, J. Comput. Syst. Sci..

[4]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[5]  Vern Paxson,et al.  IPPM Metrics for Measuring Connectivity , 1999, RFC.

[6]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[9]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10]  Robin Thomas,et al.  Hadwiger's conjecture forK6-free graphs , 1993, Comb..

[11]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[12]  W. Mader Über die Maximalzahl kreuzungsfreierH-Wege , 1978 .

[13]  Steven McCanne,et al.  Inference of multicast routing trees and bottleneck bandwidths using end-to-end measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).