Asynchronous Dataflow De-Elastisation for Efficient Heterogeneous Synthesis

Algorithmic synthesis provides flexibility in design space exploration and improves design productivity by separating the concerns of system timing and functionality. This enables a designer to cope with the rapid increase of SoC complexity and to employ different computation and communication models with various timing constraints. De-elastisation emerged as a technique that transforms timing-free concurrent dataflow models to synchronous circuits while offering selective timing flexibility in the design. We adopt De-elastisation in an in-house EDA flow: it starts from a system specification in the Balsa language and uses eTeak to generate an elastic network of macro-modules. Based on structural analysis of the obtained network some of its portions are selectively transformed into synchronous circuits, in a supervised fashion, targeting better power and performance in the computation domain, whilst preserving fine-grained elasticity between communicating modules to handle timing uncertainties. We evaluate De-elastisation and compare it against some popular high-level synthesis technologies, namely LegUp, Bluespec, Chisel and Balsa using a set of benchmarks from the domain of Database Management Systems (DBMS) accelerators. Our experiments demonstrate the efficacy of Dataflow Decomposition and De-elastisation on the selected range of applications and its advantages in exploring the design trade-offs: a twofold increase in performance and 15% decrease in power consumption can be achievable at the expense of moderate area overhead.

[1]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Jens Sparsø,et al.  A Behavioral Synthesis Frontend to the Haste/TiDE Design Flow , 2009, 2009 15th IEEE Symposium on Asynchronous Circuits and Systems.

[3]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[4]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[5]  Zhiru Zhang,et al.  High-level Synthesis for Low-power Design , 2015, IPSJ Trans. Syst. LSI Des. Methodol..

[6]  Doug A. Edwards,et al.  Teak: A Token-Flow Implementation for the Balsa Language , 2009, 2009 Ninth International Conference on Application of Concurrency to System Design.

[7]  Luciano Lavagno,et al.  Logic Synthesis for Asynchronous Controllers and Interfaces , 2002 .

[8]  Doug A. Edwards,et al.  Balsa: An Asynchronous Hardware Synthesis Language , 2002, Comput. J..

[9]  Borivoje Nikolic,et al.  Opportunities for Fine-Grained Adaptive Voltage Scaling to Improve System-Level Energy Efficiency , 2015 .

[10]  Arvind,et al.  Bounded Dataflow Networks and Latency-Insensitive circuits , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[11]  Josep Carmona,et al.  Elastic Circuits , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Doug A. Edwards,et al.  Asynchronous Data-Driven Circuit Synthesis , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[14]  Mahdi Jelodari Mamaghani,et al.  eTeak: A Data-driven Synchronous Elastic Synthesiser , 2013, ACSD 2013.

[15]  Kazutoshi Wakabayashi C-based behavioral synthesis and verification analysis on industrial design examples , 2004, ASP-DAC.

[16]  Alain J. Martin,et al.  Slack Elasticity in Concurrent Computing , 1998, MPC.

[17]  Peter A. Beerel,et al.  A Designer's Guide to Asynchronous VLSI , 2010 .

[18]  Hiroyuki Tomiyama,et al.  Profiling-driven multi-cycling in FPGA high-level synthesis , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Luca P. Carloni,et al.  A synthesis-parameter tuning system for autonomous design-space exploration , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  John Shalf,et al.  OpenSoC Fabric: On-Chip Network Generator: Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric , 2014, NoCArc '14.

[21]  Sandeep K. Shukla,et al.  Dataflow Architectures for GALS , 2008, Electron. Notes Theor. Comput. Sci..

[22]  Maciej Koutny,et al.  Persistent and Nonviolent Steps and the Design of GALS Systems , 2015, Fundam. Informaticae.

[23]  Rishiyur S. Nikhil,et al.  Bluespec: A General-Purpose Approach to High-Level Synthesis Based on Parallel Atomic Transactions , 2008 .

[24]  Severo M. Ornstein,et al.  Logical design of macromodules , 1967, AFIPS '67 (Spring).

[25]  Doug A. Edwards,et al.  Optimised Synthesis of Asynchronous Elastic Dataflows by Leveraging Clocked EDA , 2014, 2014 17th Euromicro Conference on Digital System Design.

[26]  Jim D. Garside,et al.  Automatic Clock: A Promising Approach toward GALSification , 2016, 2016 22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC).

[27]  Doug A. Edwards,et al.  De-elastisation: From asynchronous dataflows to synchronous circuits , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Adrián Cristal,et al.  An empirical evaluation of High-Level Synthesis languages and tools for database acceleration , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[29]  Ad M. G. Peeters,et al.  Click Elements: An Implementation Style for Data-Driven Compilation , 2010, 2010 IEEE Symposium on Asynchronous Circuits and Systems.

[30]  Steve Furber,et al.  Principles of Asynchronous Circuit Design: A Systems Perspective , 2010 .

[31]  David Blaauw,et al.  Bubble Razor: Eliminating Timing Margins in an ARM Cortex-M3 Processor in 45 nm CMOS Using Architecturally Independent Error Detection and Correction , 2013, IEEE Journal of Solid-State Circuits.

[32]  John Kubiatowicz,et al.  Integrated shared-memory and message-passing communication in the Alewife multiprocessor , 1998 .

[33]  Jean-Michel Chabloz,et al.  Globally-Ratiochronous, Locally-Synchronous Systems , 2012 .

[34]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[35]  Luciano Lavagno,et al.  Desynchronization: Synthesis of Asynchronous Circuits From Synchronous Specifications , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[36]  Jordi Cortadella,et al.  SELF : Specification and design of synchronous elastic circuits , 2005 .