Optimal redundancy allocation for high availability routers

How to optimally allocate redundant routers for high availability (HA) networks is a crucial task. In this paper, a 5-tuple availability function A (N, M, λ, µ, δ) is proposed to determine the minimum required number of standby routers to meet the desired availability (ρ) of an HA router, where N and M are the numbers of active routers and standby routers, respectively, and λ, µ, and δ are a single router's failure rate, repair rate, and failure detection and recovery rate, respectively. We have derived the availability function, and analytical results show that the failure detection and recovery rate (δ) is a key parameter for reducing the minimum required number of standby routers of an HA router. Thus, we also propose a High Availability Management (HAM) middleware, which was designed based on an open architecture specification, called OpenAIS, to achieve the goal of reducing takeover delay (1/δ) by stateful backup. We have implemented an HA Open Shortest Path First (HA-OSPF) router, which consists of two active routers and one standby router, to illustrate the proposed HA router. Experimental results show that the takeover delays of the proposed HA-OSPF router were reduced by 6, 37.3, and 98.6% compared with those of the industry standard approaches, the Cisco-ASR 1000 series router, the Juniper MX series router, and the Virtual Router Redundancy Protocol (VRRP) router, respectively. In addition, in contract to the industry routers, the proposed HA router, which was designed based on an open architecture specification, is more cost-effective, and its redundancy model can be more flexibly adjusted. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Tony Li,et al.  Cisco Hot Standby Router Protocol (HSRP) , 1998, RFC.

[2]  Liang Yin,et al.  Hierarchical composition and aggregation of state-based availability and performability models , 2003, IEEE Trans. Reliab..

[3]  Adamantios Mettas,et al.  Reliability allocation and optimization for complex systems , 2000, Annual Reliability and Maintainability Symposium. 2000 Proceedings. International Symposium on Product Quality and Integrity (Cat. No.00CH37055).

[4]  Way Kuo,et al.  Recent Advances in Optimal Reliability Allocation , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Sape J. Mullender,et al.  Distributed systems (2nd Ed.) , 1993 .

[6]  Acee Lindem,et al.  Virtual Router Redundancy Protocol , 1998, RFC.

[7]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[8]  S. Srivastava Redundancy management for network devices , 2003, 9th Asia-Pacific Conference on Communications (IEEE Cat. No.03EX732).

[9]  Christopher Oggerino High Availability Network Fundamentals: A Practical Guide to Predicting Network Availability , 2001 .

[10]  Swapna S. Gokhale,et al.  Analytical Models for Architecture-Based Software Reliability Prediction: A Unification Framework , 2006, IEEE Transactions on Reliability.

[11]  Kishor S. Trivedi,et al.  Ten Fallacies of Availability and Reliability Analysis , 2008, ISAS.

[12]  Robert M. Hinden,et al.  Virtual Router Redundancy Protocol (VRRP) , 2004, RFC.

[13]  Kishor S. Trivedi,et al.  Should I Add a Processor ? , .

[14]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[15]  Juha Ranta Router Redundancy and Scalability Using Clustering , 2004 .