Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm
暂无分享,去创建一个
[1] Sandia Report,et al. Toward a New Metric for Ranking High Performance Computing Systems , 2013 .
[2] Pradeep Dubey,et al. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.
[3] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[4] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[5] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Jérémie Allard,et al. Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.
[7] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[8] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[9] Sandia Report,et al. HPCG Technical Specification , 2013 .
[10] Francisco Tirado,et al. Vectorization of multigrid codes using SIMD ISA extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[11] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[12] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[14] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[15] Ulrich Rüde,et al. Cache-Aware Multigrid Methods for Solving Poisson's Equation in Two Dimensions , 2000, Computing.
[16] Chao Yang,et al. Optimizing and Scaling HPCG on Tianhe-2: Early Experience , 2014, ICA3PP.
[17] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[18] Erik Hagersten,et al. Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors , 2006, ICS '06.