Exploring Parallel Programming Models for Heterogeneous Computing Systems
暂无分享,去创建一个
[1] Alan Gray,et al. Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers , 2012 .
[2] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[3] Wu-chun Feng,et al. Architecture-Aware Mapping and Optimization on a 1600-Core GPU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.
[4] Wu-chun Feng,et al. Towards accelerating molecular modeling via multi-scale approximation on a GPU , 2011, 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).
[5] Stephen A. Jarvis,et al. Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[6] Stephen L. Olivier,et al. Toward an evolutionary task parallel integrated MPI + X programming model , 2015, PMAM@PPoPP.
[7] Ray W. Grout,et al. Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[9] Henri Calandra,et al. Experiences with OpenMP, PGI, HMPP and OpenACC Directives on ISO/TTI Kernels , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[10] Lucian Codrescu. Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[11] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[12] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[13] Martin Schulz,et al. Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[14] Ru Zhu. Speedup of Micromagnetic Simulations with C++ AMP on Graphics Processing Units , 2016, Computing in Science & Engineering.
[15] Ben Sander,et al. Applying AMD's Kaveri APU for heterogeneous computing , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[16] Mitesh R. Meswani,et al. Efficient breadth-first search on a heterogeneous processor , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[17] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[19] Simon See,et al. 在Intel Knights Corner和NVIDIA Kepler架构上OpenACC的性能可移植性分析 (Performance Portability Evaluation for OpenACC on Intel Knights Corner and NVIDIA Kepler) , 2015, 计算机科学.
[20] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[21] Satoshi Matsuoka,et al. CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.