An Efficient Multicast Router using Shared-Buffer with Packet Merging for Dataflow Architecture
暂无分享,去创建一个
Meng Wu | Yi Li | Xiaochun Ye | Dongrui Fan | Rui Xue | Dan Li | Wenming Li | Yuqing Ji | Dongrui Fan | Xiaochun Ye | Dan Li | Wenming Li | Rui Xue | Yi Li | Meng Wu | Yuqing Ji
[1] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Li-Shiuan Peh,et al. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Zhimin Zhang,et al. Memory partition for SIMD in streaming dataflow architectures , 2016, 2016 Seventh International Green and Sustainable Computing Conference (IGSC).
[4] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.
[5] Abdoulaye Gamatié,et al. Distributed and dynamic shared-buffer router for high-performance interconnect , 2017, 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).
[6] Bevan M. Baas,et al. RoShaQ: High-performance on-chip router with shared queues , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).
[7] Kathryn S. McKinley,et al. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[8] Axel Jantsch,et al. Connection-oriented multicasting in wormhole-switched networks on chip , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).
[9] Nikolaus A. Adams,et al. Numerical simulation of fluid flow on complex geometries using the Lattice-Boltzmann method and CUDA-enabled GPUs , 2009, SIGGRAPH '09.
[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[11] Xiaola Lin,et al. Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..
[12] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[13] Wu-chun Feng,et al. Towards a performance-portable FFT library for heterogeneous computing , 2014, Conf. Computing Frontiers.
[14] Zhimin Zhang,et al. An Efficient Network-on-Chip Router for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[15] Frank Mueller,et al. Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters , 2013, IEEE Transactions on Parallel and Distributed Systems.
[16] Hicham G. Elmongui,et al. Use of CUDA streams for block-based MPEG motion estimation on the GPU , 2012, SIGGRAPH '12.
[17] Zhimin Zhang,et al. A Non-Stop Double Buffering Mechanism for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[18] Rui Xue,et al. A Sharing Path Awareness Scheduling Algorithm for Dataflow Architecture , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[19] Dongrui Fan,et al. A Pipelining Loop Optimization Method for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[20] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[21] Veljko M. Milutinovic,et al. Guide to DataFlow Supercomputing , 2015, Computer Communications and Networks.
[22] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[23] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[24] Dongrui Fan,et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[25] Bill Lin,et al. A High-Throughput Distributed Shared-Buffer NoC Router , 2009, IEEE Computer Architecture Letters.
[26] Simha Sethumadhavan,et al. Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).