In this paper, we describe an experimental study of Internet stability and the origins of failure in Internet protocol backbones. The stability of end-to-end Internet paths is dependent both on the underlying telecommunication switching system, as well as the higher level software and hardware components speci c to the Internet's packet-switched forwarding and routing architecture. Although a number of earlier studies have examined failures in the public telecommunication system, little attention has been given to the characterization of Internet stability. Our paper analyzes Internet failures from three di erent perspectives. We rst examine several recent major Internet failures and their probable origins. These empirical observations illustrate the complexity of the Internet and show that unlike commercial transaction systems, the interactions of the underlying components of the Internet are poorly understood. Next, our examination focuses on the stability of paths between Internet Service Providers. Our analysis is based on the experimental instrumentation of key portions of the Internet infrastructure. Speci cally, we logged all of the routing control tra c at ve of the largest U.S. Internet exchange points over a three year period. This study of network reachability information found unexpectedly high levels of path uctuation and an aggregate low mean time between failures for individual Internet paths. These results point to a high level of instability in the global Internet backbone. While our study of the Internet backbone identi es major trends in the level of path instability between di erent service providers, these results do not characterize failures inside the network of service provider. The nal portion of our paper focuses on a case study of the network failures observed in a large regional Internet backbone. This examination of the internal stability of a network includes twelve months of operational failure logs and a review of the internal routing communication data collected between regional backbone routers. We characterize the type and frequency of failures in twenty categories, and describe the failure properties of the regional backbone as a whole. Supported by National Science Foundation Grant NCR-971017, and gifts from both Intel and Hewlett Packard.
[1]
Bilal Chinoy,et al.
Dynamics of internet routing information
,
1993,
SIGCOMM '93.
[2]
Sally Floyd,et al.
Why we don't know how to simulate the Internet
,
1997,
WSC '97.
[3]
D. Richard Kuhn,et al.
Sources of Failure in the Public Switched Telephone Network
,
1997,
Computer.
[4]
Christian Huitema,et al.
Routing in the Internet
,
1995
.
[5]
Van Jacobson,et al.
The synchronization of periodic routing messages
,
1993,
SIGCOMM '93.
[6]
Bassam Halabi,et al.
Internet Routing Architectures
,
1997
.
[7]
Farnam Jahanian,et al.
An extensible probe architecture for network protocol performance measurement
,
1998,
SIGCOMM '98.
[8]
Niraj K. Jha,et al.
Fault-tolerant computer system design
,
1996,
IEEE Parallel & Distributed Technology: Systems & Applications.
[9]
V. Paxson.
End-to-end routing behavior in the internet
,
2006,
CCRV.
[10]
Kenneth L. Calvert,et al.
Modeling Internet topology
,
1997,
IEEE Commun. Mag..
[11]
C. Chatfield,et al.
Fourier Analysis of Time Series: An Introduction
,
1977,
IEEE Transactions on Systems, Man, and Cybernetics.
[12]
Farnam Jahanian,et al.
Internet routing instability
,
1997,
SIGCOMM '97.
[13]
Farnam Jahanian,et al.
Experiments on six commercial TCP implementations using a software fault injection tool
,
1997
.
[14]
Richard Becker,et al.
Events defined by duration and severity, with an application to network reliability
,
1998
.