Resilient overlay networks

A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics. Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

[1]  Srinivasan Seshan,et al.  SPAND: Shared Passive Network Performance Discovery , 1997, USENIX Symposium on Internet Technologies and Systems.

[2]  D. Andersen,et al.  Resilient overlay networks , 2001, SOSP.

[3]  Mark Crovella,et al.  Measuring Bottleneck Link Speed in Packet-Switched Networks , 1996, Perform. Evaluation.

[4]  Yakov Rekhter,et al.  Mpls: Technology and Applications , 2000 .

[5]  Aman Shaikh,et al.  Routing stability in congested networks: experimentation and analysis , 2000 .

[6]  Stefan Savage,et al.  Sting: A TCP-based Network Measurement Tool , 1999, USENIX Symposium on Internet Technologies and Systems.

[7]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[8]  Vern Paxson,et al.  Framework for IP Performance Metrics , 1998, RFC.

[9]  Mukul Goyal,et al.  Predicting TCP Throughput From Non-invasive Data , 2001 .

[10]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[11]  Scott Shenker,et al.  Best-effort versus reservations: a simple comparative analysis , 1998, SIGCOMM '98.

[12]  Farnam Jahanian,et al.  Internet routing instability , 1997, SIGCOMM '97.

[13]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[14]  Vern Paxson,et al.  An architecture for large-scale Internet measurement , 1998, IEEE Commun. Mag..

[15]  Donald F. Towsley,et al.  Modeling TCP throughput: a simple model and its empirical validation , 1998, SIGCOMM '98.

[16]  David D. Clark,et al.  Policy routing in Internet protocols , 1989, RFC.

[17]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[18]  Dinesh C. Verma,et al.  ALMI: An Application Level Multicast Infrastructure , 2001, USITS.

[19]  Mary Baker,et al.  Measuring link bandwidths using a deterministic model of packet delay , 2000, SIGCOMM.

[20]  Vern Paxson,et al.  End-to-end Internet packet dynamics , 1997, SIGCOMM '97.

[21]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[22]  Vern Paxson,et al.  Experiences with NIMI , 2002, Proceedings 2002 Symposium on Applications and the Internet (SAINT) Workshops.

[23]  Craig Partridge Using the Flow Label Field in IPv6 , 1995, RFC.

[24]  Paul Francis,et al.  Yoid: Extending the Internet Multicast Architec-ture , 2000 .

[25]  Mischa Schwartz,et al.  ACM SIGCOMM computer communication review , 2001, CCRV.

[26]  Robert M. Hinden,et al.  IP next generation overview , 1996, CACM.

[27]  Craig Partridge,et al.  FIRE: flexible Intra-AS routing environment , 2000, IEEE J. Sel. Areas Commun..

[28]  Srinivasan Seshan,et al.  Analyzing stability in wide-area network performance , 1997, SIGMETRICS '97.

[29]  Van Jacobson,et al.  A tool to infer characteristics of internet paths , 1997 .

[30]  Marshall T. Rose,et al.  Use of the Internet as a subnetwork for experimentation with the OSI network layer , 1989, RFC.

[31]  B. Briscoe Internet Engineering Task Force , 1995 .

[32]  A. Khanna,et al.  The revised ARPANET routing metric , 1989, SIGCOMM '89.

[33]  Yakov Rekhter,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[34]  BalakrishnanHari,et al.  Resilient overlay networks , 2001 .

[35]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[36]  Kirk L. Johnson,et al.  Overcast: reliable multicasting with on overlay network , 2000, OSDI.

[37]  Ian Clarke,et al.  A Distributed Decentralised Information Storage and Retrieval System , 1999 .

[38]  Hans Eriksson,et al.  MBONE: the multicast backbone , 1994, CACM.

[39]  Vern Paxson,et al.  End-to-end routing behavior in the Internet , 1996, TNET.

[40]  New Riders MBONE: Interactive Multimedia on the Internet , 1995 .