论文信息 - Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis

Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis

Domain-specific designs offer greater energy efficiency and performance gain than general-purpose processors. For this reason, modern system-on-chips have a significant portion of their silicon area with custom accelerators. However, designing hardware by hand is laborious and time-consuming, given the large design space and the performance, power, and area constraints that are not realized in the software. Moreover, domain-specific algorithms (e.g., machine learning models) are evolving quickly, challenging the accelerator design further. To address these issues, this paper presents SODA Synthesizer, an automated open-source high-level ML framework to Verilog modular compiler targeting AI/ML Application-Specific Integrated Circuits (ASICs) accelerators. SODA tightly couples the Multi-Level Intermediate Representation (MLIR) compiler infrastructure [24] and open-source HLS approaches. Thus, SODA can support various ML frameworks and algorithms and can perform optimizations that combine specialized architecture templates and conventional HLS to generate the hardware modules. In addition, SODA’s closed-loop design space exploration (DSE) engine allows developers to perform end-to-end design space explorations on different metrics and technology nodes.

[1] Siddharth Garg,et al. CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration , 2019, ACM Trans. Embed. Comput. Syst..

[2] Hong Wang,et al. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[3] Christian Pilato,et al. Compiler Infrastructure for Specializing Domain-Specific Memory Templates , 2021, ArXiv.

[4] ScaleHLS: Achieving Scalable High-Level Synthesis through MLIR , 2021 .

[5] Christian Pilato,et al. Agile SoC Development with Open ESP : Invited Paper , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[6] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.

[7] Qiang Wu,et al. A hierarchical CDFG as intermediate representation for hardware/software codesign , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Joseph Manzano,et al. Invited: Software Defined Accelerators From Learning Tools Environment , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[10] Bernard Brezzo,et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11] David R. Kaeli,et al. Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC , 2020, 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[12] Gu-Yeon Wei,et al. The Aladdin Approach to Accelerator Design and Modeling , 2015, IEEE Micro.

[13] Giacomo Indiveri,et al. A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs) , 2017, IEEE Transactions on Biomedical Circuits and Systems.

[14] Gianluca Palermo,et al. Improving evolutionary exploration to area-time optimization of FPGA designs , 2008, J. Syst. Archit..

[15] Sumit Gupta,et al. SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits , 2004 .

[16] Gu-Yeon Wei,et al. CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development , 2020, IEEE Micro.

[17] Quoc V. Le,et al. Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[18] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19] Gaurav Menghani,et al. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better , 2021, ACM Comput. Surv..

[20] Yu Ting Chen,et al. A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[22] Yuan Xie,et al. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey , 2020, Proceedings of the IEEE.

[23] Vito Giovanni Castellana,et al. High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[24] Nagarajan Kandasamy,et al. Endurance-Aware Mapping of Spiking Neural Networks to Neuromorphic Hardware , 2021, IEEE Transactions on Parallel and Distributed Systems.

[25] Marco Minutoli,et al. Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics , 2021, IEEE Transactions on Computers.

[26] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[27] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28] Jeff Dean. Deep Learning for Solving Important Problems , 2019, WWW.

[29] Apala Guha,et al. μIR -An intermediate representation for transforming and optimizing the microarchitecture of application accelerators , 2019, MICRO.

[30] Pasi Liljeberg,et al. Energy-Efficient Virtual Machines Consolidation in Cloud Data Centers Using Reinforcement Learning , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[31] Uday Bondhugula,et al. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[32] Pier Luca Lanzi,et al. Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33] Massoud Pedram,et al. A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35] Wolfgang Maass,et al. Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[36] David A. Patterson,et al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.

[37] Yiyu Shi,et al. Hardware/Software Co-Exploration of Neural Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[38] Sarita V. Adve,et al. HPVM: heterogeneous parallel virtual machine , 2018, PPoPP.