Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters
暂无分享,去创建一个
Ryutaro Himeno | Hideaki Komatsu | Shigeho Noda | Masana Murase | Munehiro Doi | Kumiko Maeda | R. Himeno | H. Komatsu | Kumiko Maeda | S. Noda | M. Murase | M. Doi
[1] Mendel Rosenblum,et al. Streamware: programming general-purpose multicore processors using streams , 2008, ASPLOS.
[2] Patrick Crowley,et al. Auto-pipe and the X language: a pipeline design tool and description language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[3] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[4] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[5] Satoshi Matsuoka,et al. GPU accelerated computing—from hype to mainstream, the rebirth of vector computing , 2009 .
[6] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[7] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[8] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[9] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[10] Ian T. Foster,et al. Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..
[11] Milind Girkar,et al. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.
[12] Philip S. Yu,et al. SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.
[13] Ryutaro Himeno,et al. A parallel programming framework orchestrating multiple languages and architectures , 2011, CF '11.
[14] Hans P. Zima,et al. The cascade high productivity language , 2004 .
[15] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[16] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.
[17] Victor Luchangco,et al. The Fortress Language Specification Version 1.0 , 2007 .
[18] Umakishore Ramachandran,et al. Streamline: a scheduling heuristic for streaming applications on the grid , 2006, Electronic Imaging.
[19] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Cheng Wu,et al. An integrated resource management and scheduling system for grid data streaming applications , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.
[21] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[22] Steven J. Deitz,et al. Abstractions for dynamic data distribution , 2004 .