An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators
暂无分享,去创建一个
Maurizio Palesi | Kun-Chih Chen | Tim Kogel | Masoumeh Ebrahimi | Seyed Morteza Nabavinejad | Mohammad Baharloo | M. Ebrahimi | M. Palesi | M. Baharloo | K. Chen | Tim Kogel
[1] Bing Chen,et al. A general memristor-based partial differential equation solver , 2018, Nature Electronics.
[2] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[3] Hao Jiang,et al. A Memristor Crossbar Based Computing Engine Optimized for High Speed and Accuracy , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
[4] Kunle Olukotun,et al. Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Mahmut T. Kandemir,et al. ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded Processors , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[6] Yuan Xie,et al. FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture , 2019, ASPLOS.
[7] Dejan S. Milojicic,et al. PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.
[8] Lena Mashayekhy,et al. ApproxDNN: Incentivizing DNN Approximation in Cloud , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[9] Eitan Medina,et al. Habana Labs Purpose-Built AI Inference and Training Processor Architectures: Scaling AI Training Systems Using Standard Ethernet With Gaudi Processor , 2020, IEEE Micro.
[10] Michael Ferdman,et al. Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[11] David R. Kaeli,et al. Profiling DNN Workloads on a Volta-based DGX-1 System , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[12] Xu Liu,et al. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect , 2019, IEEE Transactions on Parallel and Distributed Systems.
[13] Kun-Chih Chen,et al. A NoC-based simulator for design and evaluation of deep neural networks , 2020, Microprocess. Microsystems.
[14] Bing Li,et al. RED: A ReRAM-Based Efficient Accelerator for Deconvolutional Computation , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[15] Yiran Chen,et al. ReCom: An efficient resistive accelerator for compressed deep neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[16] James Demmel,et al. Scaling Deep Learning on GPU and Knights Landing clusters , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Hyeran Jeon,et al. Graph processing on GPUs: Where are the bottlenecks? , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[18] Stefano Markidis,et al. Performance Evaluation of Advanced Features in CUDA Unified Memory , 2019, 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC).
[19] Larry R. Dennison,et al. Why Data Science and Machine Learning Need Silicon Photonics , 2020, 2020 Optical Fiber Communications Conference and Exhibition (OFC).
[20] Massimo Alioto,et al. Guest Editorial Energy-Quality Scalable Circuits and Systems for Sensing and Computing: From Approximate to Communication-Inspired and Learning-Based , 2018, IEEE J. Emerg. Sel. Topics Circuits Syst..
[21] Gerard J. M. Smit,et al. Fixed latency on-chip interconnect for hardware spiking neural network architectures , 2013, Parallel Comput..
[22] Wolfgang Straßer,et al. Fast and Scalable CPU/GPU Collision Detection for Rigid and Deformable Surfaces , 2010, Comput. Graph. Forum.
[23] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.
[25] Chris Fallin,et al. CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[26] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[27] Yu Wang,et al. GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs , 2019, ASP-DAC.
[28] Yuan Xie,et al. Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.
[29] Hao Jiang,et al. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[30] Masoud Daneshtalab,et al. Reconfigurable Network-on-Chip for 3D Neural Network Accelerators , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).
[31] Liam McDaid,et al. Advancing interconnect density for spiking neural network hardware implementations using traffic-aware adaptive network-on-chip routers , 2012, Neural Networks.
[32] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[33] Hoi-Jun Yoo,et al. An Energy-Efficient Embedded Deep Neural Network Processor for High Speed Visual Attention in Mobile Vision Recognition SoC , 2016, IEEE Journal of Solid-State Circuits.
[34] Sudhakar Yalamanchili,et al. DeepTrain: A Programmable Embedded Platform for Training Deep Neural Networks , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[35] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] Hyoukjun Kwon,et al. Rethinking NoCs for spatial neural network accelerators , 2017, 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).
[37] Joon-Sung Yang,et al. DRIS-3: Deep Neural Network Reliability Improvement Scheme in 3D Die-Stacked Memory based on Fault Analysis , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[38] Xiaowei Li,et al. Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[39] M. Breitwisch. Phase Change Memory , 2008, 2008 International Interconnect Technology Conference.
[40] Luca Benini,et al. Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[41] Liam McDaid,et al. Scalable Hierarchical Network-on-Chip Architecture for Spiking Neural Network Hardware Implementations , 2013, IEEE Transactions on Parallel and Distributed Systems.
[42] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[43] Jason Cong,et al. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[44] Jacques-Olivier Klein,et al. Spin-Transfer Torque Magnetic Memory as a Stochastic Memristive Synapse for Neuromorphic Systems , 2015, IEEE Transactions on Biomedical Circuits and Systems.
[45] Catherine Graves,et al. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[46] Masoud Daneshtalab,et al. CuPAN - High Throughput On-chip Interconnection for Neural Networks , 2015, ICONIP.
[47] Michael Ferdman,et al. Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[48] Dirk Englund,et al. Freely scalable and reconfigurable optical hardware for deep learning , 2020, Scientific Reports.
[49] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[50] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[51] Yiran Chen,et al. GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[52] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[53] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.
[54] Swagath Venkataramani,et al. Exploiting approximate computing for deep learning acceleration , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[55] Hisashi Shima,et al. Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.
[56] Michael Ferdman,et al. Argus: An End-to-End Framework for Accelerating CNNs on FPGAs , 2019, IEEE Micro.
[57] Engin Ipek,et al. Enabling Scientific Computing on Memristive Accelerators , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[58] Yao Chen,et al. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.
[59] Vivienne Sze,et al. Designing Hardware for Machine Learning: The Important Role Played by Circuit Designers , 2017, IEEE Solid-State Circuits Magazine.
[60] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[61] Sherief Reda,et al. Coordinated DVFS and Precision Control for Deep Neural Networks , 2019, IEEE Computer Architecture Letters.
[62] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[63] D. Stewart,et al. The missing memristor found , 2008, Nature.
[64] Dipankar Das,et al. Manna: An Accelerator for Memory-Augmented Neural Networks , 2019, MICRO.
[65] Sujay Deb,et al. Data-flow Aware CNN Accelerator with Hybrid Wireless Interconnection , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[66] Tadahiro Kuroda,et al. QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS , 2019, IEEE Journal of Solid-State Circuits.
[67] Shimeng Yu,et al. Metal–Oxide RRAM , 2012, Proceedings of the IEEE.
[68] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[69] Huanrui Yang,et al. AtomLayer: A Universal ReRAM-Based CNN Accelerator with Atomic Layer Computation , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[70] Hojjat Adeli,et al. Spiking Neural Networks , 2009, Int. J. Neural Syst..
[71] Xiaoming Chen,et al. moDNN: Memory optimal DNN training on GPUs , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[72] Midia Reshadi,et al. Flow mapping and data distribution on mesh-based deep learning accelerator , 2019, NOCS.
[73] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[74] Wayne Luk,et al. FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.
[75] Yehia El-khatib,et al. Adaptive deep learning model selection on embedded systems , 2018, LCTES.
[76] Hyoukjun Kwon,et al. A Communication-Centric Approach for Designing Flexible DNN Accelerators , 2018, IEEE Micro.
[77] Rong Gu,et al. Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[78] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.
[79] Xuegong Zhou,et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[80] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[81] John Jose,et al. Exploiting Data Resilience in Wireless Network-on-chip Architectures , 2020, ACM J. Emerg. Technol. Comput. Syst..
[82] Tinoosh Mohsenin,et al. BiNMAC: Binarized neural Network Manycore ACcelerator , 2018, ACM Great Lakes Symposium on VLSI.
[83] William J. Dally,et al. Domain-specific hardware accelerators , 2020, Commun. ACM.
[84] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[85] Sherief Reda,et al. Hardware acceleration of feature detection and description algorithms on low-power embedded platforms , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[86] Ruixuan Li,et al. AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-Deep Neural Networks , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).
[87] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[88] Scott A. Mahlke,et al. DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[89] Partha Pratim Pande,et al. Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[90] Vivienne Sze,et al. Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators , 2017, IEEE Micro.
[91] Denis Foley,et al. Ultra-Performance Pascal GPU and NVLink Interconnect , 2017, IEEE Micro.
[92] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[93] Dhabaleswar K. Panda,et al. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).
[94] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[95] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[96] Tinoosh Mohsenin,et al. Accelerating convolutional neural network with FFT on tiny cores , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).
[97] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[98] Zhenyu Liu,et al. High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[99] Partha Pratim Pande,et al. Design of an Energy-Efficient CMOS-Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects , 2013, IEEE Transactions on Computers.
[100] Fei Qiao,et al. Concrete: A Per-layer Configurable Framework for Evaluating DNN with Approximate Operators , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[101] Kiyoung Choi,et al. Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[102] Radu Marculescu,et al. Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).
[103] Masoud Daneshtalab,et al. EbDa: A new theory on design and verification of deadlock-free interconnection networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[104] Xueti Tang,et al. Spin-transfer torque magnetic random access memory (STT-MRAM) , 2013, JETC.
[105] Tajana Simunic,et al. FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[106] Yu Cao,et al. Interconnect-Aware Area and Energy Optimization for In-Memory Acceleration of DNNs , 2020, IEEE Design & Test.
[107] Siddharth Joshi,et al. Author Correction: Ferroelectric ternary content-addressable memory for one-shot learning , 2019, Nature Electronics.
[108] PalesiMaurizio,et al. Exploiting Data Resilience in Wireless Network-on-chip Architectures , 2020 .
[109] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .
[110] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[111] Paul Ampadu,et al. Energy-efficient and high-performance NoC architecture and mapping solution for deep neural networks , 2019, NOCS.
[112] Jim D. Garside,et al. Overview of the SpiNNaker System Architecture , 2013, IEEE Transactions on Computers.
[113] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[114] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[115] Tajana Simunic,et al. GRAM: graph processing in a ReRAM-based computational memory , 2019, ASP-DAC.
[116] Bruce M. Maggs,et al. On-line algorithms for path selection in a nonblocking network , 1990, STOC '90.
[117] Haichen Shen,et al. Nexus: a GPU cluster engine for accelerating DNN-based video analysis , 2019, SOSP.
[118] Shengen Yan,et al. GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training , 2019, IEEE Transactions on Big Data.
[119] Amir Masoud Rahmani,et al. DNN pruning and mapping on NoC-Based communication infrastructure , 2019, Microelectron. J..
[120] George Bosilca,et al. Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures , 2013, 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS).
[121] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[122] John Jose,et al. Approximate Wireless Networks-on-Chip , 2018, 2018 Conference on Design of Circuits and Integrated Systems (DCIS).
[123] Kun-Chih Chen,et al. NoC-based DNN accelerator: a future design paradigm , 2019, NOCS.
[124] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.