Experimental Study of Internet Stabil-ity and Wide-Area Backbone Failures

In this paper, we describe an experimental study of Internet stability and the origins of failure in Internet protocol backbones. The stability of end-to-end Internet paths is dependent both on the underlying telecommunication switching system, as well as the higher level software and hardware components speci c to the Internet's packet-switched forwarding and routing architecture. Although a number of earlier studies have examined failures in the public telecommunication system, little attention has been given to the characterization of Internet stability. Our paper analyzes Internet failures from three di erent perspectives. We rst examine several recent major Internet failures and their probable origins. These empirical observations illustrate the complexity of the Internet and show that unlike commercial transaction systems, the interactions of the underlying components of the Internet are poorly understood. Next, our examination focuses on the stability of paths between Internet Service Providers. Our analysis is based on the experimental instrumentation of key portions of the Internet infrastructure. Speci cally, we logged all of the routing control tra c at ve of the largest U.S. Internet exchange points over a three year period. This study of network reachability information found unexpectedly high levels of path uctuation and an aggregate low mean time between failures for individual Internet paths. These results point to a high level of instability in the global Internet backbone. While our study of the Internet backbone identi es major trends in the level of path instability between di erent service providers, these results do not characterize failures inside the network of service provider. The nal portion of our paper focuses on a case study of the network failures observed in a large regional Internet backbone. This examination of the internal stability of a network includes twelve months of operational failure logs and a review of the internal routing communication data collected between regional backbone routers. We characterize the type and frequency of failures in twenty categories, and describe the failure properties of the regional backbone as a whole. Supported by National Science Foundation Grant NCR-971017, and gifts from both Intel and Hewlett Packard.