Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips
暂无分享,去创建一个
Leibo Liu | S. Yin | Shaojun Wei | Yang Wang | Fengbin Tu | Xinhan Lin
[1] S. Kwon,et al. A Multi-Mode 8k-MAC HW-Utilization-Aware Neural Processing Unit With a Unified Multi-Precision Datapath in 4-nm Flagship Mobile SoC , 2023, IEEE Journal of Solid-State Circuits.
[2] V. Gadepally,et al. AI and ML Accelerator Survey and Trends , 2022, 2022 IEEE High Performance Extreme Computing Conference (HPEC).
[3] R. Balasubramonian,et al. CANDLES: Channel-Aware Novel Dataflow-Microarchitecture Co-Design for Low Energy Sparse Neural Network Acceleration , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[4] Ardavan Pedram,et al. Griffin: Rethinking Sparse Optimization for Deep Learning Architectures , 2021, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[5] Matthew Mattina,et al. S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration , 2021, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[6] Siyuan Qiu,et al. An Energy-Efficient Low-Latency 3D-CNN Accelerator Leveraging Temporal Locality, Full Zero-Skipping, and Hierarchical Load Balance , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).
[7] Jeremy Kepner,et al. AI Accelerator Survey and Trends , 2021, 2021 IEEE High Performance Extreme Computing Conference (HPEC).
[8] J. Doifode,et al. A Survey Paper on Acceleration of Convolutional Neural Network using Field Programmable Gate Arrays , 2021, 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT).
[9] Yang Wang,et al. A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation , 2021, 2021 Symposium on VLSI Circuits.
[10] Joel Silberman,et al. RaPiD: AI Accelerator for Ultra-low Precision Training and Inference , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[11] Hanwoong Jung,et al. Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[12] Yang Wang,et al. Dual-side Sparse Tensor Core , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[13] Hanwoong Jung,et al. 9.5 A 6K-MAC Feature-Map-Sparsity-Aware Neural Processing Unit in 5nm Flagship Mobile SoC , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).
[14] Marian Verhelst,et al. 9.4 PIU: A 248GOPS/W Stream-Based Processor for Irregular Probabilistic Inference Networks Using Precision-Scalable Posit Arithmetic in 28nm , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).
[15] Hyoukjun Kwon,et al. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[16] H. Yoo,et al. An Overview of Energy-Efficient Hardware Accelerators for On-Device Deep-Neural-Network Training , 2021, IEEE Open Journal of the Solid-State Circuits Society.
[17] Muhammad Shafique,et al. Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead , 2020, IEEE Access.
[18] M. Eleuldj,et al. Survey of Deep Learning Neural Networks Implementation on FPGAs , 2020, 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech).
[19] Guy Lemieux,et al. Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Jeremy Kepner,et al. Survey of Machine Learning Accelerators , 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC).
[21] Dongup Kwon,et al. A Multi-Neural Network Acceleration Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[22] Yuan Xie,et al. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey , 2020, Proceedings of the IEEE.
[23] Yiran Chen,et al. A Survey of Accelerator Architectures for Deep Neural Networks , 2020 .
[24] Xuehai Qian,et al. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Chia-Hung Liu,et al. 7.1 A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).
[26] Hoi-Jun Yoo,et al. 7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).
[27] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] Nirali R. Nanavati,et al. Efficient Hardware Implementations of Deep Neural Networks: A Survey , 2020, 2020 Fourth International Conference on Inventive Systems and Control (ICISC).
[29] Yangdong Deng,et al. A Survey of Coarse-Grained Reconfigurable Architecture and Design , 2019, ACM Comput. Surv..
[30] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[31] Patrick Judd,et al. ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning , 2019, MICRO.
[32] Jeremy Kepner,et al. Survey and Benchmarking of Machine Learning Accelerators , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[33] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.
[34] Leibo Liu,et al. A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[35] Leibo Liu,et al. An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width , 2019, IEEE Journal of Solid-State Circuits.
[36] David Wentzlaff,et al. The Accelerator Wall: Limits of Chip Specialization , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[37] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[38] Youngwoo Kim,et al. A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[39] Meng-Fan Chang,et al. 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[40] Jae-Gon Lee,et al. 7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[41] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[42] Hoi-Jun Yoo,et al. UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.
[43] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[45] Jae-Joon Han,et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Minjie Wang,et al. Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.
[47] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[48] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[49] Alessandro Aimar,et al. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[50] Mauricio Acconcia Dias,et al. Deep Learning in Reconfigurable Hardware: A Survey , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[51] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[52] Chunhua Deng,et al. PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[53] Jongsun Park,et al. Mosaic-CNN: A Combined Two-Step Zero Prediction Approach to Trade off Accuracy and Computation Energy in Convolutional Neural Networks , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[54] Yanzhi Wang,et al. StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs , 2018, 1807.11091.
[55] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[56] Tao Li,et al. Prediction Based Execution on Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[57] Xiangyu Li,et al. LCP: a Layer Clusters Paralleling mapping method for accelerating Inception and Residual networks on FPGA , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[58] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[59] Minjie Wang,et al. Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling , 2018, ArXiv.
[60] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[61] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[62] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[63] Rajeev Balasubramonian,et al. Moving CNN Accelerator Computations Closer to Data , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).
[64] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[65] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[66] Wei Pan,et al. Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.
[67] Yiran Chen,et al. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[68] Xu Sun,et al. meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting , 2017, ICML.
[69] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[70] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[71] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[72] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[73] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[74] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[75] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.
[76] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[77] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[78] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[79] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[80] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[81] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[82] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[83] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[84] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[86] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[87] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[88] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[89] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[90] Brendan J. Frey,et al. Winner-Take-All Autoencoders , 2014, NIPS.
[91] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[92] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[93] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[94] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.