You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design

DNN/Accelerator co-design has shown great potential in improving QoR and performance. Typical approaches separate the design flow into two-stage: (1) designing an application-specific DNN model with high accuracy; (2) building an accelerator considering the DNN specific characteristics. However, it may fails in promising the highest composite score which combines the goals of accuracy and other hardware-related constraints (e.g., latency, energy efficiency) when building a specific neural-network-based system. In this work, we present a single-stage automated framework, YOSO, aiming to generate the optimal solution of software-and-hardware that flexibly balances between the goal of accuracy, power, and QoS. Compared with the two-stage method on the baseline systolic array accelerator and Cifar10 dataset, we achieve 1.42x~2.29x energy or 1.79x~3.07x latency reduction at the same level of precision, for different user-specified energy and latency optimization constraints, respectively.

[1]  Wenguang Chen,et al.  NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Soonhoi Ha,et al.  Fast Performance Estimation and Design Space Exploration of Manycore-based Neural Processors , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[3]  Niraj K. Jha,et al.  Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture , 2019, IEEE Transactions on Computers.

[4]  Jinjun Xiong,et al.  FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[5]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[6]  Xiaowei Li,et al.  C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[8]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[10]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[11]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[13]  Song Han,et al.  Design Automation for Efficient Deep Learning Computing , 2019, ArXiv.

[14]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[15]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[17]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.