IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism
暂无分享,去创建一个
[1] Tao Yang,et al. Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines , 1999, PPoPP '99.
[2] Cédric Augonnet,et al. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.
[3] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[4] Wu-chun Feng,et al. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[5] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[6] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[7] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[8] Torsten Hoefler,et al. Hybrid MPI: Efficient message passing for multi-core systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Eduard Ayguadé,et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.
[10] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[11] Mel Gorman,et al. Understanding the Linux Virtual Memory Manager , 2004 .
[12] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[13] Rajeev Thakur,et al. Enabling MPI interoperability through flexible communication endpoints , 2013, EuroMPI.
[14] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[15] Brice Goglin,et al. KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework , 2013, J. Parallel Distributed Comput..
[16] Federico Silla,et al. Enabling CUDA acceleration within virtual machines using rCUDA , 2011, 2011 18th International Conference on High Performance Computing.
[17] Xingfu Wu,et al. Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers , 2011, PERV.
[18] Eduard Ayguadé,et al. OmpSs-OpenCL Programming Model for Heterogeneous Systems , 2012, LCPC.
[19] Seyong Lee,et al. Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Rajeev Thakur,et al. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming , 2010, Int. J. High Perform. Comput. Appl..
[21] Henri E. Bal,et al. Cashmere: Heterogeneous Many-Core Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[22] Thomas Steinke,et al. A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Jeffrey S. Vetter,et al. Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures , 2011, IEEE Micro.
[24] Rudolf Eigenmann,et al. A hybrid approach of OpenMP for clusters , 2012, PPoPP '12.
[25] Michael Garland,et al. Designing a unified programming model for heterogeneous machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Jack J. Dongarra,et al. Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[27] Thomas Fahringer,et al. LibWater: heterogeneous distributed computing made easy , 2013, ICS '13.
[28] Dhabaleswar K. Panda,et al. MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[29] Seyong Lee,et al. OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing , 2014, HPDC '14.
[30] Géraud Krawezik. Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors , 2003, SPAA '03.
[31] Ian Karlin,et al. LULESH Programming Model and Performance Ports Overview , 2012 .
[32] Torsten Hoefler,et al. Ownership passing: efficient distributed memory programming on multi-core systems , 2013, PPoPP '13.
[33] Jeffrey S. Vetter,et al. Contemporary High Performance Computing - From Petascale toward Exascale , 2019, Chapman and Hall / CRC computational science series.
[34] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[35] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[36] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[37] Pavan Balaji,et al. MT-MPI: multithreaded MPI for many-core environments , 2014, ICS '14.
[38] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[39] Pradeep Dubey,et al. Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing , 2015, Comput. Sci. Eng..
[40] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[41] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.
[42] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).