Laconic Deep Learning Inference Acceleration
暂无分享,去创建一个
Dylan Malone Stuart | Zissis Poulos | Mostafa Mahmoud | Sayeh Sharify | Milos Nikolic | Andreas Moshovos | Kevin Siu | Alberto Delmas Lascorz | Andreas Moshovos | Sayeh Sharify | M. Nikolic | M. Mahmoud | Zissis Poulos | Kevin Siu | Mostafa Mahmoud | Milos Nikolic
[1] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.
[4] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.
[5] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[6] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.
[7] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[9] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[10] Zengfu Wang,et al. Video Superresolution via Motion Compensation and Deep Residual Learning , 2017, IEEE Transactions on Computational Imaging.
[11] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[12] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[13] Frédo Durand,et al. Deep joint demosaicking and denoising , 2016, ACM Trans. Graph..
[14] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[15] Dongyoung Kim,et al. ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.
[16] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[17] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] N. Muralimanohar,et al. CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .
[21] Christoph Meinel,et al. Image Captioning with Deep Bidirectional LSTMs , 2016, ACM Multimedia.
[22] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.
[23] Dylan Malone Stuart,et al. Memory Requirements for Convolutional Neural Network Hardware Accelerators , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[24] Xuan Yang,et al. A Systematic Approach to Blocking Convolutional Neural Networks , 2016, ArXiv.
[25] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[26] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[28] Tor M. Aamodt,et al. Analyzing Machine Learning Workloads Using a Detailed GPU Simulator , 2018, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[29] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[30] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[32] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[33] Roberto Cipolla,et al. Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..
[34] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[35] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[36] Asit K. Mishra,et al. Low Precision RNNs: Quantizing RNNs Without Losing Accuracy , 2017, ArXiv.
[37] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.
[38] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Song Han,et al. INVITED: Bandwidth-Efficient Deep Learning , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[40] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[41] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[42] Wonyong Sung,et al. Fixed-point performance analysis of recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Kiyoung Choi,et al. ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator , 2018, ICS.
[44] Peter Kontschieder,et al. Decision Forests, Convolutional Networks and the Models in-Between , 2016, ArXiv.
[45] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.
[47] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[49] David P. Wipf,et al. Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.
[50] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[51] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.
[52] Eunhyeok Park,et al. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[53] Hon Keung Kwan,et al. Multilayer feedforward neural networks with single powers-of-two weights , 1993, IEEE Trans. Signal Process..
[54] Natalie D. Enright Jerger,et al. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.
[55] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[56] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[57] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[58] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[59] Boris Murmann,et al. A Pixel Pitch-Matched Ultrasound Receiver for 3-D Photoacoustic Imaging With Integrated Delta-Sigma Beamformer in 28-nm UTBB FD-SOI , 2017, IEEE Journal of Solid-State Circuits.
[60] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[61] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[62] Alberto Delmas,et al. DPRed: Making Typical Activation Values Matter In Deep Learning Computing , 2018, ArXiv.
[63] Zhenzhi Wu,et al. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework , 2017, Neural Networks.
[64] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[65] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[66] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[67] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[68] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[69] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[70] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[71] Alberto Delmas,et al. Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks , 2017, ArXiv.
[72] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[73] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Natalie D. Enright Jerger,et al. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.
[75] Taejoon Park,et al. SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning , 2018, IEEE Micro.
[76] Yoshua Bengio,et al. Low precision arithmetic for deep learning , 2014, ICLR.
[77] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[78] Patrick Judd,et al. Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks , 2017, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[79] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[80] Eunhyeok Park,et al. Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.
[81] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[82] Pradeep Dubey,et al. Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.
[83] Patrick Judd,et al. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.
[84] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[85] Eunhyeok Park,et al. Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[86] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[87] Swagath Venkataramani,et al. Exploiting approximate computing for deep learning acceleration , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[88] Jungwon Lee,et al. Towards the Limit of Network Quantization , 2016, ICLR.
[89] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[90] Mayler G. A. Martins,et al. Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.
[91] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[92] Eriko Nurvitadhi,et al. WRPN: Wide Reduced-Precision Networks , 2017, ICLR.
[93] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[94] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[95] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[96] Jiwen Lu,et al. Runtime Neural Pruning , 2017, NIPS.
[97] Dongyoung Kim,et al. A novel zero weight/activation-aware hardware architecture of convolutional neural network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[98] Victor S. Lempitsky,et al. Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[99] Paris Smaragdis,et al. Bitwise Neural Networks , 2016, ArXiv.
[100] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.