AR2C2: Actively replicated controllers for SDN resilient control plane

Software Defined Networking (SDN) is a promising architectural approach based on a programmatic separation of the control and data planes. For high availability purposes, logically centralized SDN controllers follow a distributed implementation. While controller role features in the OpenFlow protocol allow switches to communicate with multiple controllers, these mechanisms alone are not sufficient to guarantee a resilient control plane, leaving the actual implementation as open challenge for SDN designers. This paper explores OpenFlow roles for the design of resilient SDN control plane and proposes AR2C2 as an actively replicated multi-controller strategy. As proof of concept, AR2C2 is implemented based on the Ryu controller and relying on OpenReplica to ensure consistent state among the distributed controllers. Our prototype is experimentally evaluated using real commodity switches and Mininet emulated environment. Results of the measured times to recover from failures for different workloads shed some light on the practical trade-offs on replication overhead and latency as a step forward towards SDN resiliency.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[3]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[4]  Edjard de Souza Mota,et al.  Resilience of SDNs based On active and passive replication mechanisms , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[5]  Fernando M. V. Ramos,et al.  On the Design of Practical Fault-Tolerant SDN Controllers , 2014, 2014 Third European Workshop on Software Defined Networks.

[6]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[7]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[8]  Manoel Camillo Penna,et al.  A Clustered SDN Architecture for Large Scale WSON , 2014, 2014 IEEE 28th International Conference on Advanced Information Networking and Applications.

[9]  Fernando M. V. Ramos,et al.  Software-Defined Networking: A Comprehensive Survey , 2014, Proceedings of the IEEE.

[10]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[11]  Scott Shenker,et al.  CAP for networks , 2013, HotSDN '13.

[12]  IEEE/IFIP Network Operations and Management Symposium, NOMS 2010, 19-23 April 2010, Osaka, Japan , 2010, IEEE/IFIP Network Operations and Management Symposium.

[13]  Rob Sherwood,et al.  The controller placement problem , 2012, HotSDN '12.

[14]  Danny Dolev,et al.  Quicksilver Scalable Multicast (QSM) , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[15]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[16]  Fang Hao,et al.  Towards an elastic distributed SDN controller , 2013, HotSDN '13.

[17]  Michiaki Hayashi,et al.  Scalable OpenFlow Controller Redundancy Tackling Local and Global Recoveries , 2013 .

[18]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[19]  Fernando M. V. Ramos,et al.  On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs , 2013, 2013 Second European Workshop on Software Defined Networks.

[20]  Bjarne E. Helvik,et al.  Jgroup/ARM: A Distributed Object Group Platform with Autonomous Replication Management for Dependable Computing , 2008 .

[21]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[22]  Emin Gün Sirer,et al.  Commodifying Replicated State Machines with OpenReplica , 2012 .

[23]  Miguel Correia,et al.  DepSpace: a byzantine fault-tolerant coordination service , 2008, Eurosys '08.