Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
暂无分享,去创建一个
Lei He | Kun Wang | Tiandong Zhao | Yunxuan Yu | Lei He | Kun Wang | Yunxuan Yu | Tiandong Zhao
[1] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[3] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[4] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Eunhyeok Park,et al. Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.
[6] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Loukas P. Petrou,et al. Expanding a robot's life: Low power object recognition via FPGA-based DCNN deployment , 2018, 2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST).
[10] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[11] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[14] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[15] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[19] Christos-Savvas Bouganis,et al. fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[20] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[21] Xinming Huang,et al. A CNN Accelerator on FPGA Using Depthwise Separable Convolution , 2018, IEEE Transactions on Circuits and Systems II: Express Briefs.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Yu Cao,et al. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[24] Karin Strauss,et al. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .
[25] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[26] Masoud Daneshtalab,et al. ADONN: Adaptive Design of Optimized Deep Neural Networks for Embedded Systems , 2018, 2018 21st Euromicro Conference on Digital System Design (DSD).
[27] ChakradharSrimat,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010 .
[28] Luciano Lavagno,et al. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.
[29] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Lei He,et al. OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[31] Ajith Pasqual,et al. EdgeNet: SqueezeNet like Convolution Neural Network on Embedded FPGA , 2018, 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS).
[32] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[33] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[34] Seungwon Lee,et al. Quantization for Rapid Deployment of Deep Neural Networks , 2018, ArXiv.
[35] Wayne Luk,et al. Automatic Optimising CNN with Depthwise Separable Convolution on FPGA: (Abstact Only) , 2018, FPGA.
[36] David B. Thomas,et al. Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification , 2018, ARC.
[37] Forrest N. Iandola,et al. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.
[38] Junzhong Shen,et al. An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform , 2019, ICPP.
[39] Srihari Cadambi,et al. A programmable parallel accelerator for learning and classification , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[40] Lei He,et al. Overview of a FPGA-Based Overlay Processor , 2019, 2019 China Semiconductor Technology International Conference (CSTIC).
[41] Chen Feng,et al. A Quantization-Friendly Separable Convolution for MobileNets , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).