End-to-end WAN service availability

This paper seeks to understand how network failures affect the availability of service delivery across wide-area networks (WANs) and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability that includes key parameters such as failure location and failure duration. We then use trace-based simulation to evaluate several classes of techniques for coping with network unavailability. We find that caching alone is seldom effective at insulating services from failures but that the combination of mobile extension code and prefetching can improve average unavailability by as much as an order of magnitude for classes of service whose semantics support disconnected operation. We find that routing-based techniques may provide significant improvements but that the improvements of many individual techniques are limited because they do not address all significant categories of network failures. By combining the techniques we examine, some systems may be able to reduce average unavailability by as much as one or two orders of magnitude.

[1]  Amin Vahdat,et al.  Active Names: flexible location and transport of wide-area resources , 1999, Proceedings DARPA Active Networks Conference and Exposition.

[2]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[3]  Azer Bestavros,et al.  Characteristics of WWW Traces , 1995 .

[4]  K. R. Krishnan,et al.  Improved survivability with multi-layer dynamic routing , 1995 .

[5]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[6]  Ralph B. D'Agostino,et al.  Goodness-of-Fit-Techniques , 2020 .

[7]  Dan Duchamp,et al.  Prefetching Hyperlinks , 1999, USENIX Symposium on Internet Technologies and Systems.

[8]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[9]  Peter A. Dinda,et al.  Performance characteristics of mirror servers on the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[11]  Michael Dahlin,et al.  Coordinated Placement and Replacement for Large-Scale Distributed Caches , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Michael J. Feeley,et al.  The Measured Access Characteristics of World-Wide-Web Client Proxy Caches , 1997, USENIX Symposium on Internet Technologies and Systems.

[13]  Stefan Savage,et al.  The end-to-end effects of Internet path selection , 1999, SIGCOMM '99.

[14]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[15]  Ellen W. Zegura,et al.  A novel server selection technique for improving the response time of a replicated service , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[16]  M. Frans Kaashoek,et al.  Mobile Computing with the Rover Toolkit , 1997, IEEE Trans. Computers.

[17]  David R. Cheriton,et al.  Scalable Web Caching of Frequently Updated Objects Using Reliable Multicast , 1999, USENIX Symposium on Internet Technologies and Systems.

[18]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[19]  Arun Venkataramani,et al.  The potential costs and benefits of long-term prefetching for content distribution , 2002, Comput. Commun..

[20]  Margo I. Seltzer,et al.  The case for geographical push-caching , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[21]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[22]  Mahadev Satyanarayanan,et al.  Agile application-aware adaptation for mobility , 1997, SOSP.

[23]  Mor Harchol-Balter The Effect of Heavy-Tailed Job Size Distributions on Computer System Design , 1999 .

[24]  Farnam Jahanian,et al.  Experimental study of Internet stability and backbone failures , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[25]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[26]  Duane Wessels Squid internet object cache , 1996 .

[27]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[28]  Yin Zhang,et al.  The Stationarity of Internet Path Properties: Routing, Loss, and Throughput , 2000 .

[29]  Alec Wolman,et al.  On the scale and performance of cooperative Web proxy caching , 1999, SOSP.

[30]  Lei Gao,et al.  Using Mobile Extensions to Support Disconnected Services , 2000 .

[31]  Lei Gao,et al.  Resource management for scalable disconnected access to Web services , 2001, WWW '01.

[32]  Arun Venkataramani,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tcp Nice: a Mechanism for Background Transfers , 2022 .

[33]  Ronald W. Wolff,et al.  Poisson Arrivals See Time Averages , 1982, Oper. Res..

[34]  Michael Dahlin,et al.  Design considerations for distributed caching on the Internet , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[35]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[36]  Alberto Cerpa,et al.  Internet Content Adaptation Protocol (ICAP) , 2003, RFC.

[37]  Ellen W. Zegura,et al.  Application-layer anycasting , 1997, Proceedings of INFOCOM '97.

[38]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[39]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[40]  Vern Paxson,et al.  Measurements and analysis of end-to-end Internet dynamics , 1997 .

[41]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[42]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[43]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[44]  W. J. Langford Statistical Methods , 1959, Nature.

[45]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[46]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[47]  M. Frans Kaashoek,et al.  Rover: a toolkit for mobile information access , 1995, SOSP.

[48]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[49]  Mahadev Satyanarayanan,et al.  Disconnected operation in the Coda File System , 1992, TOCS.

[50]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.