SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
暂无分享,去创建一个
Dipankar Das | Hyoukjun Kwon | Tushar Krishna | Sudarshan Srinivasan | Bharat Kaul | Eric Qin | Ananda Samajdar | Vineet Nadella | Dipankar Das | Bharat Kaul | S. Srinivasan | T. Krishna | Hyoukjun Kwon | A. Samajdar | Eric Qin | V. Nadella | V. Nadella
[1] Christopher R'e,et al. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning , 2015, DanaC@SIGMOD.
[2] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[3] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[4] Chia-Lin Yang,et al. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[5] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[6] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[7] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[8] Pradeep Dubey,et al. SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[9] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[10] Patrick Judd,et al. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.
[11] Luca Benini,et al. A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets , 2018, IEEE Transactions on Computers.
[12] Bruce M. Maggs,et al. On-line algorithms for path selection in a nonblocking network , 1990, STOC '90.
[13] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[14] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[15] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[16] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.
[17] Onur Mutlu,et al. SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.
[18] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.
[19] Chunhua Deng,et al. PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[22] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[23] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[24] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[25] Stephen W. Keckler,et al. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[29] Vivek Sarkar,et al. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach , 2018, MICRO.
[30] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[32] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[33] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[34] Tony T. Lee,et al. Parallel routing algorithms in Benes-Clos networks , 2002, IEEE Trans. Commun..
[35] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.
[36] Dong Li,et al. Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Gu-Yeon Wei,et al. MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation , 2019, MICRO.
[38] Sourav Mukhopadhyay,et al. Matrix-Based Nonblocking Routing Algorithm for Beneš Networks , 2009, 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns.
[39] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[40] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).