Efficient TCP connection failover in Web server clusters

Web clusters continue to be widely used by large enterprises and organizations to host online services. Providing services without interruption is critical to the revenue and perceived image of both hosts and content providers. Therefore, server node failure and recovery should be invisible to the clients. Most of the existing fault-tolerance schemes simply stop dispatching future client requests to the failed server. They do not recover those connections handled by the node at the time of failure, which makes the failure visible to some clients. Making the failure transparent requires both application-layer and transport-layer mechanisms. While atomic application-layer primary-backup failover schemes have been addressed at length in previous literature, a transport-layer scheme is necessary in order to make them invisible to the clients. We describe a transparent TCP connection failover mechanism. Besides transparency, our solution is also highly efficient, and does not need any dedicated hardware support.

[1]  Ira Pramanick,et al.  High Availability , 2001, Int. J. High Perform. Comput. Appl..

[2]  J. Postel,et al.  File transfer protocol (FTP) , 1985 .

[3]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[4]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[5]  Hari Balakrishnan,et al.  Fine-Grained Failover Using Connection Migration , 2001, USITS.

[6]  Yuval Tamir,et al.  Client-transparent fault-tolerant Web service , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[7]  Riccardo Bettati,et al.  HydraNet-FT: network support for dependable services , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[8]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[9]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[10]  Liviu Iftode,et al.  Transport layer support for highly-available network services , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[11]  Liviu Iftode,et al.  Migratory TCP: connection migration for service continuity in the Internet , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[12]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[13]  Yuval Tamir,et al.  Implementation and evaluation of transparent fault-tolerant Web service with kernel-level support , 2002, Proceedings. Eleventh International Conference on Computer Communications and Networks.

[14]  Mon-Yen Luo,et al.  Constructing zero-loss Web services , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[15]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[16]  Lorenzo Alvisi,et al.  Wrapping server-side TCP to mask connection failures , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[17]  Mon-Yen Luo,et al.  Realizing Fault Resilience in Web-Server Cluster , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  Tim Berners-Lee,et al.  Hypertext transfer protocol--http/i , 1993 .

[19]  Azer Bestavros,et al.  Characteristics of World Wide Web Client-based Traces , 1995 .