Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
暂无分享,去创建一个
Jason Clemons | Stephen W. Keckler | Joel S. Emer | Neal Clayton Crago | Christopher W. Fletcher | Michael Pellauer | Rangharajan Venkatesan | Yakun Sophia Shao | Kartik Hegde | Rangharajan Venkatesan | J. Emer | S. Keckler | Jason Clemons | Michael Pellauer | N. Crago | Y. Shao | Kartik Hegde
[1] James C. Hoe,et al. Synthesis of operation-centric hardware descriptions , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).
[2] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[3] Jung Ho Ahn,et al. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[4] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[5] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[7] Luca P. Carloni,et al. An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[8] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[9] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[10] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[11] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[12] Christopher Torng,et al. A modular digital VLSI flow for high-productivity SoC design , 2018, DAC.
[13] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.
[14] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[15] Jason Cong,et al. Accelerator-rich architectures: Opportunities and progresses , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[16] Sarita V. Adve,et al. Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[17] Kermin Fleming,et al. Leap scratchpads: automatic memory and cache management for reconfigurable logic , 2010, FPGA '11.
[18] Gu-Yeon Wei,et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[20] Chih-Chi Cheng,et al. A patch memory system for image processing and computer vision , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Kunle Olukotun,et al. Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[22] Tao Chen,et al. Efficient data supply for hardware accelerators with prefetching and access/execute decoupling , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[24] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[25] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[26] Margaret Martonosi,et al. DeSC: Decoupled supply-compute communication management for heterogeneous architectures , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[29] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[30] Zhen Fang,et al. Buffer-Integrated-Cache: A cost-effective SRAM architecture for handheld and embedded platforms , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[31] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[32] Daniel Sánchez,et al. Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[33] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[34] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[35] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[36] James C. Hoe,et al. CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.
[37] Gu-Yeon Wei,et al. The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.
[38] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[39] Nam Sung Kim,et al. GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).