论文信息 - f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs

f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a particular task. The efficient mapping of multiple CNNs on a single FPGA device is a challenging task as the allocation of compute resources and external memory bandwidth needs to be optimised at design time. This paper proposes f-CNNx, an automated toolflow for the optimised mapping of multiple CNNs on FPGAs, comprising a novel multi-CNN hardware architecture together with an automated design space exploration method that considers the user-specified performance requirements for each model to allocate compute resources and generate a synthesisable accelerator. Moreover, f-CNNx employs a novel scheduling algorithm that alleviates the limitations of the memory bandwidth contention between CNNs and sustains the high utilisation of the architecture. Experimental evaluation shows that f-CNNx's designs outperform contention-unaware FPGA mappings by up to 50% and deliver up to 6.8x higher performance-per-Watt over highly optimised GPU designs for multi-CNN systems.

Christos-Savvas Bouganis | Stylianos I. Venieris | C. Bouganis

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Christos-Savvas Bouganis,et al. Toolflows for Mapping Convolutional Neural Networks on FPGAs , 2018, ACM Comput. Surv..

[6] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .

[7] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[8] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[9] Charles Audet,et al. Analysis of Generalized Pattern Searches , 2000, SIAM J. Optim..

[10] Christos-Savvas Bouganis,et al. Latency-driven design for FPGA-based convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[11] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.

[12] Christos-Savvas Bouganis,et al. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[13] Hari Angepat,et al. A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Nikolai Smolyanskiy,et al. Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Michele Magno,et al. Accelerating real-time embedded scene labeling with convolutional networks , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] A. Morrish,et al. Cyclic Scheduling , 1971, Journal of Nursing Administration.

[20] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[21] T. C. Edwin Cheng,et al. Complexity of cyclic scheduling problems: A state-of-the-art survey , 2010, Comput. Ind. Eng..

[22] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Yu Cao,et al. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).