A Novel 1D-Convolution Accelerator for Low-Power Real-time CNN processing on the Edge
暂无分享,去创建一个
Hamed Tabkhi | Nasim Soltani | Justin Sanchez | Ramachandra Vikas Chamarthi | Adarsh Sawant | R. V. Chamarthi | N. Soltani | H. Tabkhi | Justin Sanchez | A. Sawant
[1] Eriko Nurvitadhi,et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.
[2] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Hamed Tabkhi,et al. A Reconfigurable Streaming Processor for Real-Time Low-Power Execution of Convolutional Neural Networks at the Edge , 2018, EDGE.
[4] David C. Anastasiu,et al. The NVIDIA AI City Challenge , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).
[5] Xiaogang Wang,et al. DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] A. Strollo,et al. Low-power approximate MAC unit , 2017, 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME).
[8] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] David Gregg,et al. Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing , 2018, ACM Trans. Archit. Code Optim..
[10] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[11] Klaus Kofler,et al. Performance and Scalability of GPU-Based Convolutional Neural Networks , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[12] Christoforos E. Kozyrakis,et al. Convolution engine , 2015, Commun. ACM.
[13] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[14] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[15] Susanta Mukhopadhyay,et al. Fast Hardware Architecture for 2-D Separable Convolution Operations , 2018, IEEE Transactions on Circuits and Systems II: Express Briefs.
[16] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Soheil Ghiasi,et al. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.
[18] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[19] Hiroki Nakahara,et al. A fully connected layer elimination for a binarizec convolutional neural network on an FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[20] Lars Petersson,et al. DecomposeMe: Simplifying ConvNets for End-to-End Learning , 2016, ArXiv.
[21] David Gregg,et al. Parallel Multi Channel convolution using General Matrix Multiplication , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[22] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .
[23] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[24] Jun-Seok Park,et al. 14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).
[25] Michael Ferdman,et al. Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[26] Gunar Schirner,et al. Function-Level Processor (FLP): A Novel Processor Class for Efficient Processing of Streaming Applications , 2015, Journal of Signal Processing Systems.
[27] Sherief Reda,et al. Hardware-software codesign of accurate, multiplier-free Deep Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[28] Vincent Lepetit,et al. Learning Separable Filters , 2013, CVPR.
[29] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[30] Alessandro Aimar,et al. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[31] Zelong Wang,et al. Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA , 2018, FPGA.
[32] Dawei Li,et al. DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for Mobile Object Recognition , 2016, 2016 IEEE/ACM Symposium on Edge Computing (SEC).
[33] Gernot A. Fink,et al. Face Detection Using GPU-Based Convolutional Neural Networks , 2009, CAIP.
[34] Gregory Shakhnarovich,et al. FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.
[35] Zhen Li,et al. A survey of neural network accelerators , 2016, Frontiers of Computer Science.
[36] Kyandoghere Kyamakya,et al. CNN based high performance computing for real time image processing on GPU , 2011 .
[37] Ahmed Hemani,et al. MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[38] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Lee-Sup Kim,et al. A kernel decomposition architecture for binary-weight Convolutional Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).