Resilient overlay networks

A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

[1]  D. H. Crocker,et al.  Standard for the format of arpa intemet text messages , 1982 .

[2]  Mark R. Horton,et al.  UUCP mail interchange format standard , 1986, RFC.

[3]  David D. Clark,et al.  Policy routing in Internet protocols , 1989, RFC.

[4]  Marshall T. Rose,et al.  Use of the Internet as a subnetwork for experimentation with the OSI network layer , 1989, RFC.

[5]  A. Khanna,et al.  The revised ARPANET routing metric , 1989, SIGCOMM 1989.

[6]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[7]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[8]  Hans Eriksson,et al.  MBONE: the multicast backbone , 1994, CACM.

[9]  Vinay Kumar,et al.  Mbone: Interactive Multimedia on the Internet , 1995 .

[10]  Craig Partridge Using the Flow Label Field in IPv6 , 1995, RFC.

[11]  Yakov Rekhter,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[12]  Mark Crovella,et al.  Measuring Bottleneck Link Speed in Packet-Switched Networks , 1996, Perform. Evaluation.

[13]  Robert M. Hinden,et al.  IP next generation overview , 1996, CACM.

[14]  Srinivasan Seshan,et al.  SPAND: Shared Passive Network Performance Discovery , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Vern Paxson,et al.  End-to-end Internet packet dynamics , 1997, SIGCOMM '97.

[16]  Srinivasan Seshan,et al.  Analyzing stability in wide-area network performance , 1997, SIGMETRICS '97.

[17]  Van Jacobson,et al.  A tool to infer characteristics of internet paths , 1997 .

[18]  Vern Paxson,et al.  Framework for IP Performance Metrics , 1998, RFC.

[19]  Scott Shenker,et al.  Best-effort versus reservations: a simple comparative analysis , 1998, SIGCOMM '98.

[20]  Farnam Jahanian,et al.  Internet routing instability , 1997, SIGCOMM '97.

[21]  Vern Paxson,et al.  An architecture for large-scale Internet measurement , 1998, IEEE Commun. Mag..

[22]  Donald F. Towsley,et al.  Modeling TCP throughput: a simple model and its empirical validation , 1998, SIGCOMM '98.

[23]  Stefan Savage,et al.  Sting: A TCP-based Network Measurement Tool , 1999, USENIX Symposium on Internet Technologies and Systems.

[24]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[25]  Amin Vahdat,et al.  Detour: informed Internet routing and transport , 1999, IEEE Micro.

[26]  Yakov Rekhter,et al.  Mpls: Technology and Applications , 2000 .

[27]  Aman Shaikh,et al.  Routing stability in congested networks: experimentation and analysis , 2000 .

[28]  Abhijit Bose,et al.  Delayed internet routing convergence , 2000, SIGCOMM.

[29]  Alex C. Snoeren,et al.  FIRE: flexible Intra-AS routing environment , 2000, SIGCOMM 2000.

[30]  Mary Baker,et al.  Measuring link bandwidths using a deterministic model of packet delay , 2000, SIGCOMM 2000.

[31]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[32]  Kirk L. Johnson,et al.  Overcast: reliable multicasting with on overlay network , 2000, OSDI.

[33]  Dinesh C. Verma,et al.  ALMI: An Application Level Multicast Infrastructure , 2001, USITS.

[34]  Craig Partridge,et al.  FIRE: flexible intra-AS routing environment , 2001, IEEE J. Sel. Areas Commun..

[35]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[36]  Vern Paxson,et al.  Experiences with NIMI , 2002, Proceedings 2002 Symposium on Applications and the Internet (SAINT) Workshops.

[37]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.