Invited: Software Defined Accelerators From Learning Tools Environment

Next generation systems, such as edge devices, will need to provide efficient processing of machine learning (ML) algorithms along several metrics, including energy, performance, area, and latency. However, the quickly evolving field of ML makes it extremely difficult to generate accelerators able to support a wide variety of algorithms. At the same time, designing accelerators in hardware description languages (HDLs) by hand is hard and time consuming, and does not allow quick exploration of the design space. In this paper we present the Software Defined Accelerators From Learning Tools Environment (SODALITE), an automated open source high-level ML framework-to-verilog compiler targeting ML Application-Specific Integrated Circuits (ASICs) chiplets. The SODALITE approach will implement optimal designs by seamlessly combining custom components generated through high-level synthesis (HLS) with templated and fully tunable Intellectual Properties (IPs) and macros, integrated in an extendable resource library. Through a closed loop design space exploration engine, developers will be able to quickly explore their hardware designs along different dimensions.

[1]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[2]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[3]  Antonino Tumeo,et al.  Mapping and scheduling of parallel C applications with Ant Colony Optimization onto heterogeneous reconfigurable MPSoCs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[4]  Bertrand A. Maher,et al.  Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.

[5]  Marco Minutoli,et al.  Efficient synthesis of graph methods: A dynamically scheduled architecture , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Hadi Esmaeilzadeh,et al.  TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Jim D. Garside,et al.  SpiNNaker: Design and Implementation of a GALS Multicore System-on-Chip , 2011, JETC.

[8]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Gianluca Palermo,et al.  Improving evolutionary exploration to area-time optimization of FPGA designs , 2008, J. Syst. Archit..

[10]  Pier Luca Lanzi,et al.  Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Gianluca Palermo,et al.  An Evolutionary Approach to Area-Time Optimization of FPGA designs , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[12]  Vito Giovanni Castellana,et al.  An adaptive Memory Interface Controller for improving bandwidth utilization of hybrid and reconfigurable systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .

[14]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[15]  Pier Luca Lanzi,et al.  Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach , 2009, CODES+ISSS '09.

[16]  Uday Bondhugula,et al.  MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.

[17]  Frédo Durand,et al.  Halide , 2017, Commun. ACM.

[18]  Pier Luca Lanzi,et al.  Multiprocessor systems-on-chip synthesis using multi-objective evolutionary computation , 2010, GECCO '10.