Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight
暂无分享,去创建一个
[1] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[2] Weiguo Liu,et al. Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[4] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[5] Samuel Williams,et al. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid , 2013, 20th Annual International Conference on High Performance Computing.
[6] Daniel Ritter,et al. A Geometric Multigrid Solver on GPU Clusters , 2013 .
[7] Ninghui Sun,et al. Fast implementation of DGEMM on Fermi GPU , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] Peter Kilpatrick,et al. A parallel pattern for iterative stencil + reduce , 2016, The Journal of Supercomputing.
[9] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Samuel Williams,et al. Converting Stencils to Accumulations Forcommunication-Avoiding Optimizationin Geometric Multigrid , 2014 .
[11] Li Kenli,et al. Implementing Molecular Dynamics Simulation on Sunway TaihuLight System , 2016 .
[12] Peng Zhang,et al. Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor , 2017, 2017 46th International Conference on Parallel Processing (ICPP).
[13] Samuel Williams,et al. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers , 2017, Parallel Comput..
[14] Samuel Williams,et al. Auto-Tuning Stencil Computations on Multicore and Accelerators , 2010, Scientific Computing with Multicore and Accelerators.
[15] Chao Yang,et al. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[16] Xin Liu,et al. A Highly Effective Global Surface Wave Numerical Simulation with Ultra-High Resolution , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Guoping Long,et al. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs , 2016, Journal of Computer Science and Technology.
[18] Ulrich Rüde,et al. A Geometric Multigrid Solver on Tsubame 2.0 , 2011, Efficient Algorithms for Global Optimization Methods in Computer Vision.
[19] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] DongarraJack,et al. High-performance conjugate-gradient benchmark , 2016 .
[21] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[22] Jack J. Dongarra,et al. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..
[23] John Shalf,et al. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .
[24] Peng Zhang,et al. Performance Evaluation of HPGMG on Tianhe-2: Early Experience , 2015, ICA3PP.
[25] Wei Cao,et al. CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system , 2013, Cluster Computing.
[26] Wei Ge,et al. The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.
[27] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[29] Xu Ping,et al. 10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics , 2016 .
[30] JaeHyuk Kwack,et al. HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters—A Cray XE6/XK7 hybrid system , 2018, Concurr. Comput. Pract. Exp..
[31] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[32] Chi Xue-bin,et al. Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer , 2016 .
[33] Weiguo Liu,et al. 18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter Scenarios , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.