Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
暂无分享,去创建一个
[1] Sriram Krishnamoorthy,et al. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing , 2008, 2008 37th International Conference on Parallel Processing.
[2] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[3] W. Zhao,et al. Performance analysis of FCFS and improved FCFS scheduling algorithms for dynamic real-time computer systems , 1989, [1989] Proceedings. Real-Time Systems Symposium.
[4] Laxmikant V. Kalé,et al. Towards a framework for abstracting accelerators in parallel applications: experience with cell , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[5] Lars Ole Andersen,et al. Program Analysis and Specialization for the C Programming Language , 2005 .
[6] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[7] Milind Girkar,et al. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.
[8] Avi Mendelson,et al. Programming model for a heterogeneous x86 platform , 2009, PLDI '09.
[9] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[10] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .
[11] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[12] Norman P. Jouppi,et al. Heterogeneous chip multiprocessors , 2005, Computer.
[13] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[14] Galen C. Hunt,et al. Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.
[15] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[16] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[17] Ruoming Jin,et al. Shared Memory Paraellization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. , 2002 .
[18] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[19] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[21] Gagan Agrawal,et al. A translation system for enabling data mining applications on GPUs , 2009, ICS.
[22] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[24] Sriram Krishnamoorthy,et al. Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[25] Anand Raghunathan,et al. A framework for efficient and scalable execution of domain-specific templates on GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[26] Joseph M. Lancaster,et al. Visions for application development on hybrid computing systems , 2008, Parallel Comput..
[27] Leslie Ann Goldberg,et al. The Natural Work-Stealing Algorithm is Stable , 2001, SIAM J. Comput..
[28] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.