An efficient dataflow accelerator for scientific applications
暂无分享,去创建一个
Dongrui Fan | Meng Wu | Hao Zhang | Xu Tan | Da Wang | Yujing Feng | Xiaochun Ye | Songwen Pei
[1] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[2] Steven W. Smith,et al. The Scientist and Engineer's Guide to Digital Signal Processing , 1997 .
[3] Zhimin Zhang,et al. A Non-Stop Double Buffering Mechanism for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[4] Dongrui Fan,et al. A Pipelining Loop Optimization Method for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[5] Avi Mendelson,et al. The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices , 2013, 2013 Euromicro Conference on Digital System Design.
[6] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[7] D. Oriato,et al. Acceleration of a Meteorological Limited Area Model with Dataflow Engines , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[8] Zhimin Zhang,et al. POSTER: An optimization of dataflow architectures for scientific applications , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[9] Guangwen Yang,et al. Scaling Reverse Time Migration Performance through Reconfigurable Dataflow Engines , 2014, IEEE Micro.
[10] Wil Plouffe,et al. An asynchronous programming language and computing machine , 1978 .
[11] Amin Ansari,et al. Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Guang R. Gao,et al. An Implementation of the Codelet Model , 2013, Euro-Par.
[13] Dongrui Fan,et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[14] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[15] Wu-chun Feng,et al. Towards a performance-portable FFT library for heterogeneous computing , 2014, Conf. Computing Frontiers.
[16] Benoît Meister,et al. Runnemede: An architecture for Ubiquitous High-Performance Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[17] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[18] Oliver Pell,et al. Maximum Performance Computing with Dataflow Engines , 2012, Computing in Science & Engineering.
[19] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[20] Randy H. Katz,et al. A Berkeley View of Systems Challenges for AI , 2017, ArXiv.
[21] Frederico Pratas,et al. Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.
[22] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[23] Dongrui Fan,et al. SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[24] Zhimin Zhang,et al. Memory partition for SIMD in streaming dataflow architectures , 2016, 2016 Seventh International Green and Sustainable Computing Conference (IGSC).
[25] Zhimin Zhang,et al. An Efficient Network-on-Chip Router for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[26] Steven Swanson,et al. The WaveScalar architecture , 2007, TOCS.
[27] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[29] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[30] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).