The show must go on: Fundamental data plane connectivity services for dependable SDNs

Abstract Software-defined network (SDN) architectures raise the question of how to deal with situations where the indirection via the control plane is not fast enough or not possible. In order to provide a high availability, connectivity, and robustness, dependable SDNs must support basic functionality also in the data plane. In particular, SDNs should implement functionality for inband network traversals, e.g., to find failover paths in the presence link failures. This paper shows that robust inband network traversal schemes for dependable SDNs are feasible, and presents three fundamentally different mechanisms: simple stateless mechanisms, efficient mechanisms based on packet tagging, and mechanisms based on dynamic state at the switches. We show how these mechanisms can be implemented in today’s SDNs and discuss different applications.

[1]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM.

[2]  Junda Liu,et al.  Keep Forwarding: Towards k-link failure resilient routing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[3]  Clarence Filsfils,et al.  The Segment Routing Architecture , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[4]  Sharad Malik,et al.  In-Band Update for Network Routing Policy Migration , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[5]  Yashar Ganjali,et al.  Beehive: Towards a Simple Abstraction for Scalable Software-Defined Networking , 2014, HotNets.

[6]  Petr Kuznetsov,et al.  A distributed and robust SDN control plane for transactional network updates , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[7]  Alia Atlas,et al.  U-turn Alternates for IP/LDP Fast-Reroute , 2006 .

[8]  Alan L. Cox,et al.  Plinko: building provably resilient forwarding tables , 2013, HotNets.

[9]  Marco Chiesa,et al.  The quest for resilient (static) forwarding tables , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[10]  Scott Shenker,et al.  Achieving convergence-free routing using failure-carrying packets , 2007, SIGCOMM '07.

[11]  Stefan Schmid,et al.  How (Not) to Shoot in Your Foot with SDN Local Fast Failover - A Load-Connectivity Tradeoff , 2013, OPODIS.

[12]  Adrian Kosowski,et al.  Euler Tour Lock-In Problem in the Rotor-Router Model , 2009, DISC.

[13]  Marcos K. Aguilera,et al.  Taming uncertainty in distributed systems with help from the network , 2015, EuroSys.

[14]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[15]  Junda Liu,et al.  Data-driven network connectivity , 2011, HotNets-X.

[16]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[17]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[18]  Srikanth Kandula,et al.  Traffic engineering with forward fault correction , 2014, SIGCOMM.

[19]  Fang Hao,et al.  Towards an elastic distributed SDN controller , 2013, HotSDN '13.

[20]  Michael J. Freedman,et al.  Ravana: controller fault-tolerance in software-defined networking , 2015, SOSR.

[21]  Sorin Istrail,et al.  Polynomial universal traversing sequences for cycles are constructible , 1988, STOC '88.

[22]  Dimitri P. Bertsekas,et al.  Distributed Algorithms for Generating Loop-Free Routes in Networks with Frequently Changing Topology , 1981, IEEE Trans. Commun..

[23]  Athina Markopoulou,et al.  Characterization of failures in an IP backbone , 2004, IEEE INFOCOM 2004.

[24]  Junda Liu,et al.  Ensuring connectivity via data plane mechanisms , 2013, NSDI 2013.

[25]  David Clark,et al.  A Purpose-built Global Network: Google’s Move to SDN , 2015, ACM Queue.

[26]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[27]  Walter L. Ruzzo,et al.  Deterministic algorithms for undirected s-t connectivity using polynomial time and sublinear space. , 1991, STOC '91.

[28]  Mikkel Thorup,et al.  Planning for Fast Connectivity Updates , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[29]  Chen-Nee Chuah,et al.  Fast Local Rerouting for Handling Transient Link Failures , 2007, IEEE/ACM Transactions on Networking.

[30]  Stefan Schmid,et al.  Provable data plane connectivity with local fast failover: introducing openflow graph algorithms , 2014, HotSDN.

[31]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[32]  Stefan Savage,et al.  California fault lines: understanding the causes and impact of network failures , 2010, SIGCOMM '10.

[33]  Marco Chiesa,et al.  On the Resiliency of Randomized Routing Against Multiple Edge Failures , 2016, ICALP.

[34]  Srihari Nelakuditi,et al.  IP fast reroute with failure inferencing , 2007, INM '07.

[35]  Omer Reingold,et al.  Undirected connectivity in log-space , 2008, JACM.

[36]  Joan Feigenbaum,et al.  On the Resilience of Routing Tables , 2012, ArXiv.

[37]  Dhar,et al.  Eulerian Walkers as a Model of Self-Organized Criticality. , 1996, Physical review letters.

[38]  Giuseppe Bianchi,et al.  OpenState: programming platform-independent stateful openflow applications inside the switch , 2014, CCRV.

[39]  Stefan Schmid,et al.  Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane , 2014, HotNets.

[40]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[41]  Alan L. Cox,et al.  Scalable Multi-Failure Fast Failover via Forwarding Table Compression , 2016, SOSR.

[42]  Tibor Cinkler,et al.  A Novel Loop-Free IP Fast Reroute Algorithm , 2007, EUNICE.

[43]  Navendu Jain,et al.  Understanding network failures in data centers , 2011, SIGCOMM 2011.

[44]  Aditya Akella,et al.  A Highly Available Software Defined Fabric , 2014, HotNets.

[45]  Olivier Tilmans,et al.  IGP-as-a-backup for robust SDN networks , 2014, 10th International Conference on Network and Service Management (CNSM) and Workshop.

[46]  Liron Schiff Medieval: Towards A Self-Stabilizing, Plug & Play, In-Band SDN Control Network , 2015 .