A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
暂无分享,去创建一个
[1] Federico Silla,et al. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.
[2] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[3] Matthias Noack. HAM - Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi , 2014 .
[4] Mitsuhisa Sato,et al. TACO: prototyping high-level object-oriented programming constructs by means of template based programming techniques , 2001, SIGP.
[5] Dhabaleswar K. Panda,et al. Efficient Intra-node Communication on Intel-MIC Clusters , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[6] Wu-chun Feng,et al. VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).
[7] Amnon Barak,et al. A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).
[8] Johannes Schmidt-Ehrenberg,et al. Metastable Conformations via successive Perron-Cluster Cluster Analysis of dihedrals , 2002 .
[9] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[10] Dhabaleswar K. Panda,et al. MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .
[12] Wen-mei W. Hwu,et al. GPU Computing Gems Jade Edition , 2011 .
[13] Wen-mei W. Hwu,et al. GPU Computing Gems Emerald Edition , 2011 .
[14] Michael Lang,et al. The reverse-acceleration model for programming petascale hybrid systems , 2009, IBM J. Res. Dev..
[15] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[16] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[17] Thomas A. Halgren. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..
[18] Yutaka Ishikawa,et al. Direct MPI Library for Intel Xeon Phi Co-Processors , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[19] Unix System Laboratories. System V Application Binary Interface , 1993 .
[20] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[21] Dhabaleswar K. Panda,et al. MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand , 2013, ICS '13.
[22] Michael Klemm,et al. From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture , 2012, Computing in Science & Engineering.