MAERI : Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects

The microarchitecture of DNN inference engines is an active research topic in the computer architecture community because DNN accelerators are needed to maximize performance/watt for mass deployment across phones, cars, and so on. This has led to a flurry of ASIC DNN accelerator proposals in academia over recent years. Industry is also investing heavily so every major company developing its own neural network accelerator, which resulted in myriad of dataflow patterns. We claim that dataflows essentially lead to different kinds of data movement within an accelerator. Thus, to support arbitrary dataflows in accelerators, we propose to make interconnects programmable. We achieve it by augmenting all compute elements (multipliers and adders) and on-chip buffers with tiny switches, which can be configured at compile time or runtime. Our design, MAERI, connects these switches via a new configurable and non-blocking tree topology to provide not only programmability but also high throughput. ACM Reference Format: Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects. In Proceedings of SysML ’18. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/ nnnnnnn.nnnnnnn

[1]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[6]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[8]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.