Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms
暂无分享,去创建一个
Radu Marculescu | Partha Pratim Pande | Diana Marculescu | Janardhan Rao Doppa | Wonje Choi | Ryan Gary Kim | Karthi Duraisamy | P. Pande | K. Duraisamy | R. Marculescu | Diana Marculescu | R. Kim | Wonje Choi | J. Doppa
[1] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[2] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[3] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[4] Radu Marculescu,et al. "It's a small world after all": NoC performance optimization via long-range link insertion , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[5] Olav Lysne,et al. Layered routing in irregular networks , 2006, IEEE Transactions on Parallel and Distributed Systems.
[6] David A. Wood,et al. GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors , 2015, 2015 IEEE International Symposium on Workload Characterization.
[7] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[8] Terrence Mak,et al. A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many-Cores , 2016, IEEE Circuits and Systems Magazine.
[9] Partha Pratim Pande,et al. Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[10] Ujjwal Maulik,et al. A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.
[11] Partha Pratim Pande,et al. Design of an Energy-Efficient CMOS-Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects , 2013, IEEE Transactions on Computers.
[12] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[14] Partha Pratim Pande,et al. Enhancing performance of wireless NoCs with distributed MAC protocols , 2015, Sixteenth International Symposium on Quality Electronic Design.
[15] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[16] David R. Kaeli,et al. Asymmetric NoC Architectures for GPU Systems , 2015, NOCS.
[17] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Yue Ping Zhang,et al. Propagation Mechanisms of Radio Waves Over Intra-Chip Channels With Integrated Antennas: Frequency-Domain Measurements and Time-Domain Analysis , 2007, IEEE Transactions on Antennas and Propagation.
[19] Masoud Daneshtalab,et al. Reconfigurable communication fabric for efficient implementation of neural networks , 2015, 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).
[20] Indrani Paul,et al. Achieving Exascale Capabilities through Heterogeneous Computing , 2015, IEEE Micro.
[21] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[22] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[23] Klaus Kofler,et al. Performance and Scalability of GPU-Based Convolutional Neural Networks , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[24] Partha Pratim Pande,et al. Design Space Exploration for Wireless NoCs Incorporating Irregular Network Routing , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[25] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[26] John Kim,et al. Throughput-Effective On-Chip Networks for Manycore Accelerators , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[27] Sudhakar Yalamanchili,et al. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture , 2013, J. Parallel Distributed Comput..
[28] Wim Bogaerts,et al. Design Challenges in Silicon Photonics , 2014, IEEE Journal of Selected Topics in Quantum Electronics.
[29] A. Sugavanam,et al. Wireless communication in a flip-chip package using integrated antennas on silicon substrates , 2005, IEEE Electron Device Letters.
[30] Sudhakar Yalamanchili,et al. Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..
[31] Mahmut T. Kandemir,et al. Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[32] Jim D. Garside,et al. SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation , 2013, IEEE Journal of Solid-State Circuits.
[33] Jinchun Kim,et al. Bandwidth-efficient on-chip interconnect designs for GPGPUs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[34] Ran Ginosar,et al. Network-on-Chip Architectures for Neural Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.
[35] Yu Su,et al. Communication Using Antennas Fabricated in Silicon Integrated Circuits , 2007, IEEE Journal of Solid-State Circuits.