Building bug-tolerant routers with virtualization

Implementation bugs are a highly critical problem in wide-area networks. The software running on core routers is subject to vulnerabilities, coding mistakes, and misconfiguration. Unfortunately, these problems are often found after deployment in live networks, where they lead to outages, make networks prone to attack, and involve a challenging process to localize and debug. In this work, we propose a bug-tolerant router that runs multiple diverse copies of router software in parallel, such that each copy is unlikely to fail at the same time as the others. Diversity is achieved by varying the ordering and timing of routing messages, running different routing protocols, running code written by different implementers, etc. Because each copy is different, each copy will likely have a different output during an error, and hence a simple voting procedure is then used to decide which copy's output will "drive" packet forwarding and control-plane communication with other routers. In this paper we motivate our design, describe some design decisions and tradeoffs, and then conclude with a description of our ongoing work in building a prototype of this architecture.

[1]  Nick Feamster,et al.  Detecting BGP configuration faults with static analysis , 2005 .

[2]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[3]  GaoLixin,et al.  How to lease the internet in your spare time , 2007 .

[4]  Eddie Kohler,et al.  The Click modular router , 1999, SOSP.

[5]  Lixin Gao,et al.  How to lease the internet in your spare time , 2007, CCRV.

[6]  Pavlin Radoslavov,et al.  Designing extensible IP router software , 2005, NSDI.

[7]  Emery D. Berger,et al.  DieHard: probabilistic memory safety for unsafe languages , 2006, PLDI '06.

[8]  John Moy,et al.  OSPF Version 2 , 1998, RFC.

[9]  Robert M. Hinden Virtual Router Redundancy Protocol (VRRP) , 2004, RFC.

[10]  Ye Wang,et al.  Shadow configuration as a network management primitive , 2008, SIGCOMM '08.

[11]  Acee Lindem,et al.  Virtual Router Redundancy Protocol , 1998, RFC.

[12]  William H. Sanders,et al.  Delta execution for software reliability , 2007 .

[13]  J. Rexford,et al.  Network-Wide Prediction of BGP Routes , 2007, IEEE/ACM Transactions on Networking.

[14]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[15]  Yi Wang,et al.  VROOM: Virtual ROuters On the Move , 2007, HotNets.

[16]  Renata Teixeira,et al.  Characterizing network events and their impact on routing , 2007, CoNEXT '07.

[17]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[18]  Tony Li,et al.  Cisco Hot Standby Router Protocol (HSRP) , 1998, RFC.

[19]  Nick Feamster,et al.  In VINI veritas: realistic and controlled network experimentation , 2006, SIGCOMM.

[20]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[21]  Rahul Tongia,et al.  Connectivity and the Digital Divide – Technology, Policy, and Design Tradeoffs for Developing Regions , 2006 .