DVFS in loop accelerators using BLADES

Hardware accelerators are common in embedded systems that have high performance requirements but must still operate within stringent energy constraints. To facilitate short time-to-market and reduced non-recurring engineering costs, automatic systems that can rapidly generate hardware bearing both power and performance in mind are extremely attractive. This paper proposes the BLADES (Better-than-worst-case Loop Accelerator Design) system for automatically designing self-tuning hardware accelerators that dynamically select their best operating frequency and voltage based on environmental conditions, silicon variation, and input data characteristics. Errors in operation are detected by Razor flip-flops, and recovery is initiated. The architecture efficiently supports detection, rollback, and recovery to provide a highly adaptable and configurable loop accelerator. The overhead of deploying Razor flip-flops is significantly reduced by automatically chaining primitive computation operations together. Results on a range of loop accelerators show average energy savings of 32% gained by voltage scaling below the nominal supply voltage.

[1]  Gabriel H. Loh,et al.  Static strands: safely collapsing dependence chains for increasing embedded power efficiency , 2005, LCTES '05.

[2]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[3]  Dragan Maksimovic,et al.  Closed-loop adaptive voltage scaling controller for standard-cell ASICs , 2002, ISLPED '02.

[4]  Scott A. Mahlke,et al.  Cost sensitive modulo scheduling in a loop accelerator synthesis system , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[5]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[6]  James E. Smith,et al.  Using dynamic binary translation to fuse dependent instructions , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[8]  R.W. Brodersen,et al.  A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.

[9]  Shail Aditya,et al.  Cycle-time aware architecture synthesis of custom hardware accelerators , 2002, CASES '02.

[10]  Takashi Ishikawa,et al.  Design methodology of ultra low-power MPEG4 codec core exploiting voltage scaling techniques , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[11]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.