Communication-Computation Overlapping for Preconditioned Parallel Iterative Solvers with Dynamic Loop Scheduling
暂无分享,去创建一个
[1] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[2] Nakajima Kengo,et al. Performance Evaluation of Pipelined CG Method , 2016 .
[3] Taisuke Boku,et al. Performance and Scalability of Lightweight Multi-kernel Based Operating Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[4] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[5] Wim Vanroose,et al. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..
[6] K. Nakajima. Parallel Iterative Solvers of GeoFEM with Selective Blocking Preconditioning for Nonlinear Contact Problems on the Earth Simulator , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[7] Kengo Nakajima. Optimization of serial and parallel communications for parallel geometric multigrid method , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[8] Satoshi Matsuoka,et al. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Toshihiro Hanawa,et al. Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore and Manycore Clusters , 2017, 2017 46th International Conference on Parallel Processing Workshops (ICPPW).
[10] Hiroshi Okuda,et al. Parallel Iterative Solvers for Unstructured Grids Using an OpenMP/MPI Hybrid Programming Model for the GeoFEM Platform on SMP Cluster Architectures , 2002, ISHPC.
[11] Barry F. Smith,et al. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .
[12] Yutaka Ishikawa,et al. On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[13] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..
[14] Yutaka Ishikawa,et al. Parallel Multigrid Methods on Manycore Clusters with IHK/McKernel , 2019, 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).
[16] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.