论文信息 - CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

We present CFU Playground, a full-stack open-source framework that enables rapid and iterative design of machine learning (ML) accelerators for embedded ML systems. Our toolchain tightly integrates open-source software, RTL generators, and FPGA tools for synthesis, place, and route. This full-stack development framework gives engineers access to explore bespoke architectures that are customized and co-optimized for embedded ML. The rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground’s design loop, we show substantial speedups (55×-75×) and design space exploration between the CPU and accelerator.

[1] Jean-Christophe Le Lann,et al. LiteX: an open-source SoC builder and library based on Migen Python DSL , 2020, ArXiv.

[2] Jerry Zhao,et al. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration , 2019, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[3] Joel Emer,et al. A method to estimate the energy consumption of deep neural networks , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[4] Mitsuhisa Sato,et al. Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[5] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6] V. Reddi,et al. TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems , 2020, MLSys.

[7] Luca P. Carloni,et al. hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices , 2021, ArXiv.

[8] V. Reddi,et al. AI Tax: The Hidden Cost of AI Data Center Applications , 2020 .

[9] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[10] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[11] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12] YU WANG,et al. A Survey of FPGA-Based Neural Network Inference Accelerator , 2019 .

[13] Eddie Hung,et al. Yosys+nextpnr: An Open Source Framework from Verilog to Bitstream for Commercial FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14] Ajay Joshi,et al. AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[15] Asit K. Mishra,et al. From High-Level Deep Network Models to FPGA Acceleration , 2016 .

[16] E. LESTER SMITH,et al. AND OTHERS , 2005 .

[17] Ahmad Shawahna,et al. FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.