Intelligent probing: A cost-effective approach to fault diagnosis in computer networks

We consider the use of probing technology for cost-effective fault diagnosis in computer networks. Probes are test transactions that can be actively selected and sent through the network. This work addresses the probing problem using methods from artificial intelligence. We call the resulting approach intelligent probing. The probes are selected by reasoning about the interactions between the probe paths. Although finding the optimal probe set is prohibitively expensive for large networks, we implement algorithms that find near-optimal probe sets in linear time. In the diagnosis phase, we use a Bayesian network approach and use a local-inference approximation scheme that avoids the intractability of exact inference for large networks. Our results show that the quality of this approximate inference "degrades gracefully" under increasing uncertainty and increases as the quality of the probe set increases.

[1]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[2]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[3]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[4]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[5]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[6]  Sheng Ma,et al.  Optimizing Probe Selection for Fault Localization , 2001, DSOM.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  John W. Sheppard,et al.  System Level Diagnosis , 1994 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Aurel A. Lazar,et al.  Fault Isolation Based on Decision-theoretic Troubleshooting Fault Isolation Based on Decision-theoretic Troubleshooting , 1996 .

[11]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[12]  Gregory M. Provan,et al.  Why is diagnosis using belief networks insensitive to imprecision in probabilities? , 1996, UAI.

[13]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[15]  Allan Leinwand,et al.  Network management (2nd ed.): a practical perspective , 1995 .

[16]  Rina Dechter,et al.  A Scheme for Approximating Probabilistic Inference , 1997, UAI.

[17]  Allan Leinwand,et al.  Network Management: A Practical Perspective , 1993 .

[18]  David Heckerman,et al.  A Tractable Inference Algorithm for Diagnosing Multiple Diseases , 2013, UAI.

[19]  Charles R. Kime,et al.  System Fault Diagnosis: Closure and Diagnosability with Repair , 1975, IEEE Transactions on Computers.

[20]  Rina Dechter,et al.  Empirical Evaluation of Approximation Algorithms for Probabilistic Decoding , 1998, UAI.

[21]  R. Dechter,et al.  Efficient reasoning in graphical models , 1999 .

[22]  W. Gropp,et al.  Accepted for publication , 2001 .

[23]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[24]  Rina Dechter,et al.  Network-Based Heuristics for Constraint-Satisfaction Problems , 1987, Artif. Intell..