CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

We present CFU Playground, a full-stack open-source framework that enables rapid and iterative design of machine learning (ML) accelerators for embedded ML systems. Our toolchain tightly integrates open-source software, RTL generators, and FPGA tools for synthesis, place, and route. This full-stack development framework gives engineers access to explore bespoke architectures that are customized and co-optimized for embedded ML. The rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground’s design loop, we show substantial speedups (55×-75×) and design space exploration between the CPU and accelerator.

[1]  Jean-Christophe Le Lann,et al.  LiteX: an open-source SoC builder and library based on Migen Python DSL , 2020, ArXiv.

[2]  Jerry Zhao,et al.  Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration , 2019, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[3]  Joel Emer,et al.  A method to estimate the energy consumption of deep neural networks , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[4]  Mitsuhisa Sato,et al.  Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[5]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  V. Reddi,et al.  TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems , 2020, MLSys.

[7]  Luca P. Carloni,et al.  hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices , 2021, ArXiv.

[8]  V. Reddi,et al.  AI Tax: The Hidden Cost of AI Data Center Applications , 2020 .

[9]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[10]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[11]  Dong Han,et al.  Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  YU WANG,et al.  A Survey of FPGA-Based Neural Network Inference Accelerator , 2019 .

[13]  Eddie Hung,et al.  Yosys+nextpnr: An Open Source Framework from Verilog to Bitstream for Commercial FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  Ajay Joshi,et al.  AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[15]  Asit K. Mishra,et al.  From High-Level Deep Network Models to FPGA Acceleration , 2016 .

[16]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[17]  Ahmad Shawahna,et al.  FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.