SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning
暂无分享,去创建一个
Leibo Liu | S. Yin | Shaojun Wei | Yang Wang | Yubin Qin | Shouyi Yin
[1] Yang Wang,et al. A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation , 2021, 2021 Symposium on VLSI Circuits.
[2] Sunwoo Lee,et al. A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).
[3] Yang Wang,et al. Evolver: A Deep Learning Processor With On-Device Quantization–Voltage–Frequency Tuning , 2021, IEEE Journal of Solid-State Circuits.
[4] Andreas Moshovos,et al. TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Quoc V. Le,et al. Rethinking Pre-training and Self-training , 2020, NeurIPS.
[6] Hoi-Jun Yoo,et al. A 146.52 TOPS/W Deep-Neural-Network Learning Processor with Stochastic Coarse-Fine Pruning and Adaptive Input/Output/Weight Skipping , 2020, 2020 IEEE Symposium on VLSI Circuits.
[7] Hoi-Jun Yoo,et al. 7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).
[8] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[9] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Tao Li,et al. Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[11] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[12] Leonidas J. Guibas,et al. KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Leibo Liu,et al. An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width , 2019, IEEE Journal of Solid-State Circuits.
[14] Xin Wang,et al. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.
[15] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[16] Youngwoo Kim,et al. A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[17] Meng-Fan Chang,et al. Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers , 2018, 2018 IEEE Symposium on VLSI Circuits.
[18] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[19] Jacek M. Zurada,et al. Building Efficient ConvNets using Redundant Feature Pruning , 2018, ArXiv.
[20] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] David Kappel,et al. Deep Rewiring: Training very sparse deep networks , 2017, ICLR.
[22] Peter Stone,et al. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.
[23] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[25] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[30] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[34] Hoi-Jun Yoo,et al. PNPU: An Energy-Efficient Deep-Neural-Network Learning Processor With Stochastic Coarse–Fine Level Weight Pruning and Adaptive Input/Output/Weight Zero Skipping , 2021, IEEE Solid-State Circuits Letters.
[35] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .