An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
暂无分享,去创建一个
Shreesha Srinath | Christopher Batten | Tao Chen | G. Edward Suh | G. Suh | C. Batten | Tao Chen | S. Srinath
[1] Shunning Jiang,et al. Mamba: Closing the Performance Gap in Productive Hardware Development Frameworks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[2] Vamsi Boppana,et al. A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform , 2016, IEEE Micro.
[3] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[4] Yao Wang,et al. Aggressive pipelining of irregular applications on reconfigurable hardware , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[6] C. A. R. Hoare,et al. Algorithm 64: Quicksort , 1961, Commun. ACM.
[7] Selim G. Akl,et al. Optimal Parallel Merging and Sorting Without Memory Conflicts , 1987, IEEE Transactions on Computers.
[8] Satnam Singh,et al. Kiwi: Synthesis of FPGA Circuits from Parallel Programs , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.
[9] Ioana Burcea,et al. A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.
[10] Christopher Batten,et al. PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Stephen L. Olivier,et al. UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.
[13] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[14] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[15] Gu-Yeon Wei,et al. MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[16] F. Warren Burton,et al. Executing functional programs on a virtual tree of processors , 1981, FPCA '81.
[17] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[18] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[19] Stephen D. Brown,et al. From Pthreads to Multicore Hardware Systems in LegUp High-Level Synthesis for FPGAs , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[20] Tao Chen,et al. Efficient data supply for hardware accelerators with prefetching and access/execute decoupling , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[22] Mike Hutton. Stratix® 10: 14nm FPGA delivering 1GHz , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[23] Kunle Olukotun,et al. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[24] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[25] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[26] George A. Constantinides,et al. A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.
[27] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.