A framework for efficient and scalable execution of domain-specific templates on GPUs
暂无分享,去创建一个
[1] Kurt Keutzer,et al. A map reduce framework for programming graphics processors , 2010 .
[2] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] Steven S. Lumetta,et al. CUBA: an architecture for efficient CPU/co-processor data communication , 2008, ICS '08.
[4] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[5] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[6] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[7] William J. Dally,et al. Compilation for explicitly managed memory hierarchies , 2007, PPOPP.
[8] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[9] Niklas Sörensson,et al. Translating Pseudo-Boolean Constraints into SAT , 2006, J. Satisf. Boolean Model. Comput..
[10] Milind Girkar,et al. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.
[11] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[12] John Meyer,et al. Grading nuclear pleomorphism on histological micrographs , 2008, 2008 19th International Conference on Pattern Recognition.
[13] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[14] Bingsheng He,et al. Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.