SODA: a New Synthesis Infrastructure for Agile Hardware Design of Machine Learning Accelerators

Next-generation systems, such as edge devices, will have to provide efficient processing of machine learning (ML) algorithms, along with several metrics, including energy, performance, area, and latency. However, the quickly evolving field of ML makes it extremely difficult to generate accelerators able to support a wide variety of algorithms. Simultaneously, designing accelerators in hardware description languages (HDLs) by hand is laborious and time-consuming, and does not allow quick exploration of the design space. This paper discusses the SODA synthesizer, an automated open-source high-level ML framework-to-Verilog compiler targeting ML Application-Specific Integrated Circuits (ASICs) chiplets based on the LLVM infrastructure. The SODA synthesizers will allow implementing optimal designs by combining templated and fully tunable IPs and macros, and fully custom components generated through high-level synthesis. All these components will be provided through an extendable resource library, characterized by commercial and open-source logic design flows. Through a closed-loop design space exploration engine, developers will quickly explore their hardware designs along different dimensions.

[1]  Kunle Olukotun,et al.  Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[2]  Pier Luca Lanzi,et al.  Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach , 2009, CODES+ISSS '09.

[3]  Hadi Esmaeilzadeh,et al.  TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[5]  Gianluca Palermo,et al.  Improving evolutionary exploration to area-time optimization of FPGA designs , 2008, J. Syst. Archit..

[6]  Antonino Tumeo,et al.  Mapping and scheduling of parallel C applications with Ant Colony Optimization onto heterogeneous reconfigurable MPSoCs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[7]  Alvin Cheung,et al.  Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Creating An Agile Hardware Flow , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[9]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[11]  Frédo Durand,et al.  Halide , 2017, Commun. ACM.

[12]  Jian Weng,et al.  DSAGEN: Synthesizing Programmable Spatial Accelerators , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[13]  Pier Luca Lanzi,et al.  Multiprocessor systems-on-chip synthesis using multi-objective evolutionary computation , 2010, GECCO '10.

[14]  Pier Luca Lanzi,et al.  Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Donggyu Kim,et al.  Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Bertrand A. Maher,et al.  Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.

[17]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[18]  Jim D. Garside,et al.  SpiNNaker: Design and Implementation of a GALS Multicore System-on-Chip , 2011, JETC.

[19]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .

[20]  Gianluca Palermo,et al.  An Evolutionary Approach to Area-Time Optimization of FPGA designs , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[21]  Tianqi Chen,et al.  Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[22]  Andrew B. Kahng,et al.  INVITED: Toward an Open-Source Digital Flow: First Learnings from the OpenROAD Project , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).