暂无分享,去创建一个
Gu-Yeon Wei | Yuan Yao | David Brooks | Kshitij Bhardwaj | Sam Likun Xi | Paul Whatmough | Gu-Yeon Wei | D. Brooks | P. Whatmough | S. Xi | K. Bhardwaj | Yuan Yao
[1] Suren Jayasuriya,et al. EVA²: Exploiting Temporal Redundancy in Live Computer Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[2] Patrick Hansen,et al. FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning , 2019, ArXiv.
[3] Yongqiang Lyu,et al. SNrram: an efficient sparse neural network computation architecture based on resistive random-access memory , 2018, DAC.
[4] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Jason Cong,et al. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[6] Xianwei Zhang,et al. Optimizing GPU Cache Policies for MI Workloads* , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[7] Nikhil Ketkar,et al. Introduction to PyTorch , 2021, Deep Learning with Python.
[8] Matthew Mattina,et al. Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[9] Mahmut T. Kandemir,et al. GemDroid: a framework to evaluate mobile platforms , 2014, SIGMETRICS '14.
[10] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[11] Daehyun Kim,et al. μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization , 2019, EuroSys.
[12] R. Sindhu Reddy,et al. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2018 .
[13] Snehasish Kumar,et al. Fusion: Design tradeoffs in coherent cache hierarchies for accelerators , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[14] Gu-Yeon Wei,et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[16] Hyoukjun Kwon,et al. An Analytic Model for Cost-Benefit Analysis of Dataflows in DNN Accelerators , 2018 .
[17] William J. Dally,et al. MAGNet: A Modular Accelerator Generator for Neural Networks , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[18] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[19] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[21] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[22] Reetuparna Das,et al. Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[23] Nikhil Ketkar,et al. Deep Learning with Python , 2017 .
[24] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[25] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[26] Patrick Hansen,et al. ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems , 2019, ArXiv.
[27] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] Rachata Ausavarungnirun,et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.
[29] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[30] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.
[31] Matthew Mattina,et al. Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference , 2020, IEEE Computer Architecture Letters.
[32] Jason Cong,et al. PARADE: A cycle-accurate full-system simulation Platform for Accelerator-Rich Architectural Design and Exploration , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[33] James C. Hoe,et al. CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.
[34] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[35] Matthew Mattina,et al. A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[36] Andrew B. Kahng,et al. CACTI-IO: CACTI with off-chip power-area-timing models , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[37] Luca Benini,et al. Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ , 2013 .
[38] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[39] Asit K. Mishra,et al. From High-Level Deep Network Models to FPGA Acceleration , 2016 .
[40] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[41] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[42] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[43] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[44] Xuehai Qian,et al. G-TSC: Timestamp Based Coherence for GPUs , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[45] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[46] David A. Wood,et al. Border control: Sandboxing accelerators , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[48] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[49] John Wawrzynek,et al. Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[50] Gu-Yeon Wei,et al. A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators , 2019, 2019 Symposium on VLSI Circuits.
[51] Priyanka Raina,et al. DNN Dataflow Choice Is Overrated , 2018, ArXiv.
[52] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.
[53] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[54] Yuhao Zhu,et al. ASV: Accelerated Stereo Vision System , 2019, MICRO.
[55] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[56] Naehyuck Chang,et al. HSIM-DNN: Hardware Simulator for Computation-, Storage- and Power-Efficient Deep Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.
[57] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[58] Wei Zhang,et al. PAAS: A system level simulator for heterogeneous computing architectures , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[59] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[60] Sarita V. Adve,et al. Spandex: A Flexible Interface for Efficient Heterogeneous Coherence , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[61] Luca P. Carloni,et al. Broadening the exploration of the accelerator design space in embedded scalable platforms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[62] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[63] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[64] Carole-Jean Wu,et al. MLPerf Inference Benchmark , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[65] Yuan Xie,et al. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.
[66] Carole-Jean Wu,et al. Exploiting Parallelism Opportunities with Deep Learning Frameworks , 2019, ACM Trans. Archit. Code Optim..
[67] David A. Wood,et al. Crossing Guard: Mediating Host-Accelerator Coherence Interactions , 2017, ASPLOS.
[68] Jacob Nelson,et al. SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[69] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[70] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] H.-S. Philip Wong,et al. On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[72] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[73] Vivek Sarkar,et al. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach , 2018, MICRO.
[74] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[75] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.