ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
暂无分享,去创建一个
[1] Lieven Eeckhout,et al. Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[2] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[3] Maurice Steinman,et al. AMD Fusion APU: Llano , 2012, IEEE Micro.
[4] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[5] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[6] Richard E. Ladner,et al. The influence of caches on the performance of heaps , 1996, JEAL.
[7] Jie Shen,et al. Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms , 2013, CF '13.
[8] Mark Allen Weiss,et al. Data structures and algorithm analysis , 1991 .
[9] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Sean Keely,et al. Parallel suffix array and least common prefix for the GPU , 2013, PPoPP '13.
[12] Hsien-Hsin S. Lee,et al. Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching , 2010, TACO.
[13] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[14] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] George Varghese,et al. A 22nm IA multi-CPU and GPU System-on-Chip , 2012, 2012 IEEE International Solid-State Circuits Conference.
[16] Takakazu Kurokawa,et al. Power Efficiency Evaluation of Block Ciphers on GPU-Integrated Multicore Processor , 2012, ICA3PP.
[17] David R. Kaeli,et al. Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems , 2013, GPGPU@ASPLOS.
[18] John Paul Shen,et al. Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[19] Lieven Eeckhout,et al. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures , 2013, TACO.
[20] Adam P. Hill,et al. An On-chip Heterogeneous Implementation of a General Sparse Linear Solver , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[21] Anand Raghunathan,et al. Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Scott B. Baden,et al. Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.
[23] Yi Yang,et al. CPU-assisted GPGPU on fused CPU-GPU architectures , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[24] Bingsheng He,et al. Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..
[25] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[26] Wu-chun Feng,et al. On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[27] Laxmi N. Bhuyan,et al. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.
[28] KumarRakesh,et al. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance , 2004 .
[29] Donald B. Johnson,et al. Priority Queues with Update and Finding Minimum Spanning Trees , 1975, Inf. Process. Lett..
[30] Hsien-Hsin S. Lee,et al. COMPASS: a programmable data prefetcher using idle GPU shaders , 2010, ASPLOS XV.
[31] Dong Li,et al. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.
[32] David Abrahams,et al. C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ In-Depth Series) , 2004 .
[33] José Nelson Amaral,et al. Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms , 2007, SPAA '07.
[34] Sudhakar Yalamanchili,et al. Accelerating simulation of agent-based models on heterogeneous architectures , 2013, GPGPU@ASPLOS.
[35] EeckhoutLieven,et al. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures , 2013 .
[36] Mayank Daga,et al. Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.