Fast Performance Estimation and Design Space Exploration of Manycore-based Neural Processors

In the design of a neural processor, a cycle-accurate simulator is usually built to estimate the performance before hardware implementation. Since using the simulator to perform design space exploration (DSE) of hardware architecture is quite time consuming, we propose a novel method to use a high-level analytical model for fast DSE. In the model, non-deterministic execution delay is modeled with some parameters whose contribution to the performance is estimated statically by simulation. The viability of the proposed methodology is confirmed with two neural processors with different manycore architectures, achieving 2000 times speed-up within 3% accuracy error, compared with simulator-based DSE. CCS CONCEPTS •Computer systems organization → Multicore architectures;

[1]  Richard W. Vuduc,et al.  A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.

[2]  Vittorio Zaccaria,et al.  A correlation-based design space exploration methodology for multi-processor systems-on-chip , 2010, Design Automation Conference.

[3]  Vivienne Sze,et al.  Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks , 2018, ArXiv.

[4]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Dongyoung Kim,et al.  ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.

[6]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7]  Soonhoi Ha,et al.  NNSim: Fast Performance Estimation Based on Sampled Simulation of GPGPU Kernels for Neural Networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[8]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  Andy D. Pimentel,et al.  Design space pruning through hybrid analysis in system-level design space exploration , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Eunhyeok Park,et al.  Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[11]  Matthew Mattina,et al.  SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[12]  Tulika Mitra,et al.  CGPredict: Embedded GPU Performance Estimation from Single-Threaded Applications , 2017, ACM Trans. Embed. Comput. Syst..