Reverse Offload Programming on Heterogeneous Systems
暂无分享,去创建一个
Canqun Yang | Yang Liu | Fang Wang | Liang Deng | Dan Zhao | Wenxiang Yang | Cheng Chen
[1] Lars Koesterke,et al. MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi , 2016, Concurr. Comput. Pract. Exp..
[2] John A. Gunnels,et al. Petascale computing with accelerators , 2009, PPoPP '09.
[3] Canqun Yang,et al. A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[4] Xing Cai,et al. Communication‐hiding programming for clusters with multi‐coprocessor nodes , 2015, Concurr. Comput. Pract. Exp..
[5] Tao Tang,et al. Orchestrating parallel detection of strongly connected components on GPUs , 2018, Parallel Comput..
[6] Chao Yang,et al. 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores , 2016, Int. J. High Perform. Comput. Appl..
[7] Avi Mendelson,et al. Programming model for a heterogeneous x86 platform , 2009, PLDI '09.
[8] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[9] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[10] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Tao Tang,et al. Streaming Applications on Heterogeneous Platforms , 2016, NPC.
[12] Ieee Xiang,et al. The TianHe-1A Supercomputer: Its Hardware and Software , 2011 .
[13] Yi Yang,et al. Semi-automatic restructuring of offloadable tasks for many-core accelerators , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Tao Tang,et al. LU factorization on heterogeneous systems: an energy-efficient approach towards high performance , 2016, Computing.
[15] Giuseppe Coviello,et al. COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors , 2013, HPDC '13.
[16] Thomas Steinke,et al. A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Michael Lang,et al. The reverse-acceleration model for programming petascale hybrid systems , 2009, IBM J. Res. Dev..
[18] Dhabaleswar K. Panda,et al. Efficient Intra-node Communication on Intel-MIC Clusters , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[19] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[20] Canqun Yang,et al. Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors , 2015, 2015 44th International Conference on Parallel Processing.
[21] Pradeep Dubey,et al. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[22] Alejandro Duran,et al. Heterogeneous Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[23] Jingling Xue,et al. Efficient and accurate analytical modeling of whole-program data cache behavior , 2004, IEEE Transactions on Computers.
[24] Yun Zhou,et al. The Reliability Wall for Exascale Supercomputing , 2012, IEEE Transactions on Computers.
[25] Dhabaleswar K. Panda,et al. MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[26] Canqun Yang,et al. MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.
[27] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[28] Canqun Yang,et al. HPCG: Preliminary Evaluation and Optimization on Tianhe-2 CPU-only Nodes , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.
[29] Canqun Yang,et al. FT-Offload: A Scalable Fault-Tolerance Programing Model on MIC Cluster , 2015, ICA3PP.
[30] Peng Zhang,et al. Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[31] Jack Dongarra,et al. A new metric for ranking high-performance computing systems , 2016, National Science Review.
[32] Jack J. Dongarra,et al. Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi , 2013, PPAM.
[33] Tao Tang,et al. Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[34] Dhabaleswar K. Panda,et al. MVAPICH2-MIC: A High Performance MPI Library for Xeon Phi Clusters with InfiniBand , 2013, 2013 Extreme Scaling Workshop (xsw 2013).
[35] Yi Yang,et al. COMP: Compiler Optimizations for Manycore Processors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[36] Canqun Yang,et al. Toward fault-tolerant hybrid programming over large-scale heterogeneous clusters via checkpointing/restart optimization , 2017, The Journal of Supercomputing.