Enabling fault resilience for web services

Today, a successful Internet service is absolutely critical to be up 100 percent of the time. Server clustering is the most promising approach to meet this requirement. However, the existing Web server-clustering solutions can merely provide high availability derived from its redundant nature, but offer no guarantee about fault resilience for the service. In this paper, we address this problem by implementing an innovative mechanism that enables a Web request to be smoothly migrated and recovered on another working node in the presence of server failure. We will show that the request migration and recovery could be efficiently achieved in the manner of user transparency. The achieved capability of fault resilience is important and essential for a variety of critical services (e.g. E-commerce), which are increasingly widespread in use. Our approach takes an important step towards providing a highly reliable Web service.

[1]  Debanjan Saha,et al.  Design, implementation and performance of a content-based switch , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[2]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[3]  Yi-Min Wang,et al.  ONE-IP: Techniques for Hosting a Service on a Cluster of Machines , 1997, Comput. Networks.

[4]  Daniel M. Dias,et al.  A scalable and highly available web server , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[5]  David A. Maltz,et al.  TCP Splice for application layer proxy performance , 1999, J. High Speed Networks.

[6]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[7]  Oscar H. Ibarra,et al.  Toward a Scalable Distributed {WWW} Server on Workstation Clusters , 1997, J. Parallel Distributed Comput..

[8]  Philip S. Yu,et al.  Efficient state estimators for load control policies in scalable Web server clusters , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[9]  A. Iyengar,et al.  An analysis of Web server performance , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[10]  Deron Liang,et al.  NT-SwiFT: software implemented fault tolerance on Windows NT , 2004, J. Syst. Softw..

[11]  Arun Iyengar,et al.  A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[12]  Mon-Yen Luo,et al.  Design and Implementation of an Administration System for Distributed Web Server , 1998, LISA.

[13]  Santosh K. Shrivastava,et al.  Constructing Dependable Web Services , 1999, IEEE Internet Comput..

[14]  Philip S. Yu,et al.  Adaptive TTL schemes for load balancing of distributed Web servers , 1997, PERV.

[15]  Eric A. Brewer,et al.  System support for scalable and fault tolerant Internet services , 1999, Distributed Syst. Eng..

[16]  Guerney D. H. Hunt,et al.  Network Dispatcher: A Connection Router for Scalable Internet Services , 1998, Comput. Networks.

[17]  Mon-Yen Luo,et al.  Efficient Support for Content-based Routing in Web Server Clusters , 1999, USENIX Symposium on Internet Technologies and Systems.

[18]  Arun Iyengar,et al.  A scalable system for consistently caching dynamic Web data , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[19]  Jeffrey C. Mogul,et al.  The case for persistent-connection HTTP , 1995, SIGCOMM '95.

[20]  Mon-Yen Luo,et al.  A content placement and management system for distributed Web-server systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[21]  Felix C. Gärtner,et al.  Fundamentals of fault-tolerant distributed computing in asynchronous environments , 1999, CSUR.

[22]  John H. Hartman,et al.  Optimizing TCP forwarder performance , 2000, TNET.

[23]  Philip S. Yu,et al.  Analysis of Task Assignment Policies in Scalable Distributed Web-Server Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[24]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[25]  Sanjay R. Radia,et al.  The SunSCALR framework for Internet servers , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[26]  Robert Martin McCool,et al.  Lessons Learned Administering Netscape's Internet Site , 1997, IEEE Internet Comput..

[27]  Michael Garland,et al.  Implementing distributed server groups for the World Wide Web , 1995 .

[28]  Daniel A. Reed,et al.  NCSA's World Wide Web Server: Design and Performance , 1995, Computer.

[29]  Darrell D. E. Long,et al.  A longitudinal survey of Internet host reliability , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[30]  Rachid Guerraoui,et al.  A pragmatic implementation of e-transactions , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[31]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[32]  Eric Levy-Abegnoli,et al.  Design and performance of a Web server accelerator , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[33]  Sampath Rangarajan,et al.  On the Performance of TCP Splicing for URL-Aware Redirection , 1999, USENIX Symposium on Internet Technologies and Systems.

[34]  Ravishankar K. Iyer,et al.  Reliability of Internet Hosts: A Case Study from the End User's Perspective , 1999, Comput. Networks.