Automating efficient variable-grained resiliency for low-power IoT systems

New trends in edge computing encourage pushing more of the compute and analytics to the outer edge and processing most of the data locally. We explore how to transparently provide resiliency for heavy duty edge applications running on low-power devices that must deal with frequent and unpredictable power disruptions. Complicating this process further are (a) memory usage restrictions in tiny low-power devices, that affect not only performance but efficacy of the resiliency techniques, and (b) differing resiliency requirements across deployment environments. Nevertheless, an application developer wants the ability to write an application once, and have it be reusable across all low-power platforms and across all different deployment settings. In response to these challenges, we have devised a transparent roll-back recovery mechanism that performs incremental checkpoints with minimal execution time overhead and at variable granularities. Our solution includes the co-design of firmware, runtime and compiler transformations for providing seamless fault-tolerance, along with an auto-tuning layer that automatically generates multiple resilient variants of an application. Each variant spreads application’s execution over atomic transactional regions of a certain granularity. Variants with smaller regions provide better resiliency, but incur higher overhead; thus, there is no single best option, but rather a Pareto optimal set of configurations. We apply these strategies across a variety of edge device applications and measure the execution time overhead of the framework on a TI MSP430FR6989. When we restrict unin- terrupted atomic intervals to 100ms, our framework keeps geomean overhead below 2.48x.

[1]  Brandon Lucia,et al.  Chain: tasks and channels for reliable intermittent programs , 2016, OOPSLA.

[2]  Vivek Sarkar,et al.  Array SSA form and its use in parallelization , 1998, POPL '98.

[3]  Janak H. Patel,et al.  Error Recovery in Shared Memory Multiprocessors Using Private Caches , 1990, IEEE Trans. Parallel Distributed Syst..

[4]  Kang-Deog Suh,et al.  A 0.4-/spl mu/m 3.3-V 1T1C 4-Mb nonvolatile ferroelectric RAM with fixed bitline reference voltage scheme and data protection circuit , 2000, IEEE Journal of Solid-State Circuits.

[5]  Milo M. K. Martin,et al.  SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[6]  Matthew Hicks,et al.  Intermittent Computation without Hardware Support or Programmer Intervention , 2016, OSDI.

[7]  Satoshi Hoshina,et al.  Fault recovery mechanism for multiprocessor servers , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[8]  D. Inman,et al.  A Review of Power Harvesting from Vibration using Piezoelectric Materials , 2004 .

[9]  Brandon Lucia,et al.  A simpler, safer programming and execution model for intermittent systems , 2015, PLDI.

[10]  Yehuda Afek,et al.  Lowering STM Overhead with Static Analysis , 2010, LCPC.

[11]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[12]  Luca Benini,et al.  Hibernus: Sustaining Computation During Intermittent Supply for Energy-Harvesting Systems , 2015, IEEE Embedded Systems Letters.

[13]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[14]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[15]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[16]  Kevin Fu,et al.  Mementos: system support for long-running computation on RFID-scale devices , 2011, ASPLOS XVI.

[17]  Rana Ejaz Ahmed,et al.  Cache-aided rollback error recovery (CARER) algorithm for shared-memory multiprocessor systems , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[18]  David Wetherall,et al.  RFID sensor networks with the Intel WISP , 2008, SenSys '08.