PIT: Processing-In-Transmission With Fine-Grained Data Manipulation Networks
暂无分享,去创建一个
Nanning Zheng | Wenzhe Zhao | Pengju Ren | Pengchen Zong | Jianming Tong | Tian Xia | Haoran Zhao | Zehua Li | Nanning Zheng | Wenzhe Zhao | Pengju Ren | Pengchen Zong | Tian Xia | Haoran Zhao | Jianming Tong | Zehua Li
[1] Arvind,et al. Terabyte Sort on FPGA-Accelerated Flash Storage , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[2] H.C. Neto,et al. Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion , 2007, 2007 3rd Southern Conference on Programmable Logic.
[3] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[4] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[5] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[6] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[7] Hiroshi Inoue,et al. SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures , 2015, Proc. VLDB Endow..
[8] Natalie D. Enright Jerger,et al. On-Chip Networks , 2009, On-Chip Networks.
[9] Yuanyuan Yang,et al. A New Self-Routing Multicast Network , 1999, IEEE Trans. Parallel Distributed Syst..
[10] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[11] Valery Sklyarov,et al. Hardware implementation of recursive sorting algorithms , 2011, 2011 International Conference on Electronic Devices, Systems and Applications (ICEDSA).
[12] Gustavo Alonso,et al. Sorting networks on FPGAs , 2012, The VLDB Journal.
[13] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[14] Jean C. Walrand,et al. A Benes packet network , 2012, 2013 Proceedings IEEE INFOCOM.
[15] Yann LeCun,et al. 1.1 Deep Learning Hardware: Past, Present, and Future , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[16] Nanning Zheng,et al. COCOA: Content-Oriented Configurable Architecture Based on Highly-Adaptive Data Transmission Networks , 2020, ACM Great Lakes Symposium on VLSI.
[17] David A. Patterson,et al. A new golden age for computer architecture , 2019, Commun. ACM.
[18] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[19] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Igor L. Markov,et al. Limits on fundamental limits to computation , 2014, Nature.
[21] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[22] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[23] Zhan Wang,et al. A New Traffic Offloading Method with Slow Switching Optical Device in Exascale Computer , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).
[24] Mohamed Ibrahim,et al. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).