Aggressive pipelining of irregular applications on reconfigurable hardware
暂无分享,去创建一个
Yao Wang | Yangdong Deng | Leibo Liu | Shouyi Yin | Shaojun Wei | Zhaoshi Li | Yangdong Deng | Leibo Liu | S. Yin | Shaojun Wei | Zhaoshi Li | Yao Wang
[1] Karthikeyan Sankaralingam,et al. Efficient execution of memory access phases using dataflow specialization , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[2] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[3] Yu Wang,et al. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search , 2016, FPGA.
[4] Edward A. Lee,et al. Scheduling dynamic dataflow graphs with bounded memory using the token flow model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] Jason Cong,et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[6] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[7] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[8] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[9] Eduard Ayguadé,et al. Loop level speculation in a task based programming model , 2013, 20th Annual International Conference on High Performance Computing.
[10] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[11] K. Keutzer,et al. System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[12] Kunle Olukotun,et al. Generating Configurable Hardware from Parallel Patterns , 2015, International Conference on Architectural Support for Programming Languages and Operating Systems.
[13] David F. Bacon,et al. FPGA programming for the masses , 2013, CACM.
[14] Edward A. Lee,et al. A framework for comparing models of computation , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[15] Wu-chun Feng,et al. OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures , 2016, J. Signal Process. Syst..
[16] Yu Zhang,et al. Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.
[17] Yong Wang,et al. SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[18] Christian Haubelt,et al. Electronic System-Level Synthesis Methodologies , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[19] Jason Cong,et al. Efficient compilation of CUDA kernels for high-performance computing on FPGAs , 2013, TECS.
[20] Kunle Olukotun,et al. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[21] Keshav Pingali,et al. Optimistic parallelism requires abstractions , 2007, PLDI '07.
[22] Implementing FPGA Design with the OpenCL Standard , 2010 .
[23] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[24] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[25] Robert J. Halstead,et al. High-Level Language Tools for Reconfigurable Computing , 2015, Proceedings of the IEEE.
[26] Bratin Saha,et al. McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.
[27] Karthikeyan Sankaralingam,et al. Exploring the potential of heterogeneous Von Neumann/dataflow execution models , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[28] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[29] William J. Dally,et al. Allocator implementations for network-on-chip routers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[30] Viktor K. Prasanna,et al. Accelerating Large-Scale Single-Source Shortest Path on FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[31] Magnus Jahre,et al. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[32] Arun Raman,et al. Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.
[33] Keshav Pingali,et al. Kinetic Dependence Graphs , 2015, ASPLOS.
[34] Joshua S. Auerbach,et al. Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.
[35] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.
[36] Ayal Zaks,et al. Speculative separation for privatization and reductions , 2012, PLDI.
[37] J. Shewchuk,et al. Streaming computation of Delaunay triangulations , 2006, SIGGRAPH '06.
[38] David F. Bacon,et al. FPGA Programming for the Masses , 2013, ACM Queue.
[39] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.
[40] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[41] Guy E. Blelloch,et al. Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.
[42] Arthur B. Maccabe,et al. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages , 1990, PLDI '90.
[43] Kunle Olukotun,et al. GraphOps: A Dataflow Library for Graph Analytics Acceleration , 2016, FPGA.
[44] ChenDeming,et al. Efficient compilation of CUDA kernels for high-performance computing on FPGAs , 2013 .
[45] Antonia Zhai,et al. Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.
[46] George A. Constantinides,et al. High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).
[47] Wayne Luk,et al. CASK: Open-Source Custom Architectures for Sparse Kernels , 2016, FPGA.
[48] Andrew V. Goldberg,et al. Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.
[49] Keshav Pingali,et al. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.