A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim
暂无分享,去创建一个
Matthew Mattina | Yuhao Zhu | Tushar Krishna | Jan Moritz Joseph | Ananda Samajdar | Paul Whatmough | P. Whatmough | Matthew Mattina | T. Krishna | Yuhao Zhu | A. Samajdar | J. Joseph
[1] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[2] Gu-Yeon Wei,et al. A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators , 2019, 2019 Symposium on VLSI Circuits.
[3] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.
[4] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[5] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[6] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[8] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[9] Jason Clemons,et al. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration , 2019, ASPLOS.
[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[11] H. T. Kung,et al. Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[12] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Patrick Hansen,et al. FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning , 2019, ArXiv.
[15] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[16] Ryan P. Adams,et al. SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers , 2019, NeurIPS.
[17] Gu-Yeon Wei,et al. A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs , 2019, IEEE Journal of Solid-State Circuits.
[18] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.
[19] Yuhao Zhu,et al. ASV: Accelerated Stereo Vision System , 2019, MICRO.
[20] Pradeep Dubey,et al. SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[21] Matthew Mattina,et al. Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[22] H.-S. Philip Wong,et al. On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[23] Vivek Sarkar,et al. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach , 2018, MICRO.
[24] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[25] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[26] Gu-Yeon Wei,et al. SMAUG , 2019, ACM Trans. Archit. Code Optim..
[27] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.
[28] Gu-Yeon Wei,et al. Applications of Deep Neural Networks for Ultra Low Power IoT , 2017, 2017 IEEE International Conference on Computer Design (ICCD).
[29] Swagath Venkataramani,et al. DyHard-DNN: Even More DNN Acceleration with Dynamic Hardware Reconfiguration , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.