Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs
暂无分享,去创建一个
Gregory K. Chen | Ram Krishnamurthy | Aravind Dasu | Martin Langhammer | Eriko Nurvitadhi | Ali Jafari | Andrew Boutros | Dongup Kwon | Raghavan Kumar | Jaewoong Sim | Gregory Chen | Bogdan Pasca | Sergey Gribok | Debbie Marr | Phillip Tomson | Huseyin Sumbul | Phil Knag | Phil V. Knag | Jaewoong Sim | R. Krishnamurthy | E. Nurvitadhi | Debbie Marr | H. Sumbul | Andrew Boutros | A. Jafari | A. Dasu | B. Pasca | Dongup Kwon | M. Langhammer | Raghavan Kumar | Sergey Gribok | Phil Tomson
[1] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[2] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[3] Vaughn Betz,et al. You Cannot Improve What You Do not Measure , 2018, ACM Trans. Reconfigurable Technol. Syst..
[4] Norbert Wehn,et al. FINN-L: Library Extensions and Design Trade-Off Analysis for Variable Precision LSTM Networks on FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[5] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[6] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[7] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[8] Aravind Dasu,et al. In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC , 2018, FPL.
[9] Mohamed S. Abdelfattah,et al. DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[10] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Zheng Guo,et al. A 23.6-Mb/mm $^{2}$ SRAM in 10-nm FinFET Technology With Pulsed-pMOS TVC and Stepped-WL for Low-Voltage Applications , 2019, IEEE Journal of Solid-State Circuits.
[12] Aravind Dasu,et al. In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[13] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[14] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[15] Erich Elsen,et al. Persistent RNNs: Stashing Recurrent Weights On-Chip , 2016, ICML.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Eriko Nurvitadhi,et al. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[18] Jeff Pool,et al. Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip , 2018, ICLR.
[19] C. Auth,et al. A 10nm high performance and low-power CMOS technology featuring 3rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).
[20] Ali Akoglu,et al. A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM , 2014, 2014 27th IEEE International System-on-Chip Conference (SOCC).
[21] Martin Langhammer,et al. Activation Function Architectures for FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[22] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[23] Martin Langhammer,et al. Floating-Point DSP Block Architecture for FPGAs , 2015, FPGA.
[24] Norbert Wehn,et al. Hardware architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[25] Eriko Nurvitadhi,et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.
[26] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.