On the Design of Fault- Tolerance in a Decentralized Software Platform for Power Systems

The vision of the ‘Smart Grid’ assumes a distributed real-time embedded system that implements various monitoring and control functions. As the reliability of the power grid is critical to modern society, the software supporting the grid must support fault tolerance and resilience in the resulting cyber-physical system. This paper describes the fault-tolerance features of a software framework called Resilient Information Architecture Platform for Smart Grid (RIAPS). The framework supports various mechanisms for fault detection and mitigation and works in concert with the applications that implement the grid-specific functions. The paper discusses the design philosophy for and the implementation of the fault tolerance features and presents an application example to show how it can be used to build highly resilient systems.

[1]  Gabor Karsai,et al.  Colored Petri Net-based Modeling and Formal Analysis of Component-based Applications , 2014, MoDeVVa@MoDELS.

[2]  COMPONENT-BASED SOFTWARE PRODUCT LINE ENGINEERING , 2013 .

[3]  Gabor Karsai,et al.  TRANSAX: A Blockchain-Based Decentralized Forward-Trading Energy Exchanged for Transactive Microgrids , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[4]  Srinivas Katipamula,et al.  VOLTTRON 3.0: User Guide , 2015 .

[5]  Joseph Sifakis,et al.  Rigorous Component-Based System Design Using the BIP Framework , 2011, IEEE Software.

[6]  Hong Linh Truong,et al.  MQTT-S — A publish/subscribe protocol for Wireless Sensor Networks , 2008, 2008 3rd International Conference on Communication Systems Software and Middleware and Workshops (COMSWARE '08).

[7]  Gabor Karsai,et al.  RIAPS: Resilient Information Architecture Platform for Decentralized Smart Systems , 2017, 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC).

[8]  Gabor Karsai,et al.  Time Synchronization Services for Low-Cost Fog Computing Applications , 2017, 2017 International Symposium on Rapid System Prototyping (RSP).

[9]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[10]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX Annual Technical Conference.

[11]  Björn Stierand GitLab: Postmortem of database outage of January 31 , 2017 .

[12]  A. Monti,et al.  Distributed intelligence for smart grid control , 2010, 2010 International School on Nonsinusoidal Currents and Compensation.

[13]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[14]  Gabor Karsai,et al.  Device Access Abstractions for Resilient Information Architecture Platform for Smart Grid , 2019, IEEE Embedded Systems Letters.

[15]  Pieter Hintjens,et al.  ZeroMQ: Messaging for Many Applications , 2013 .

[16]  Heinz W. Schmidt,et al.  Trustworthy components - compositionality and prediction , 2003, J. Syst. Softw..

[17]  Robert Hanmer,et al.  Patterns for Fault Tolerant Software , 2007 .