26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight
暂无分享,去创建一个
Chao Yang | Fangfang Liu | Ping Xu | Wei Xue | Lin Gan | Haohuan Fu | Wenjing Ma | Xinliang Wang | Yulong Ao | L. Gan | H. Fu | Wei Xue | Xinliang Wang | Chao Yang | Wenjing Ma | Fangfang Liu | Yulong Ao | Ping Xu
[1] Pawel Gepner,et al. Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor , 2015, Sci. Program..
[2] P. Lauritzen. Numerical techniques for global atmospheric models , 2011 .
[3] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[4] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[5] Hirofumi Tomita,et al. Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5 , 2016, PASC.
[6] Satoshi Matsuoka,et al. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[7] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[8] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[9] Marcel Bauer,et al. Numerical Methods for Partial Differential Equations , 1994 .
[10] D. Keyes,et al. Jacobian-free Newton-Krylov methods: a survey of approaches and applications , 2004 .
[11] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[12] Alan Norton,et al. Petascale WRF simulation of hurricane sandy: Deployment of NCSA's cray XE6 blue waters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[13] Satoshi Matsuoka,et al. An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] W. K. Anderson,et al. Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[15] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[16] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[17] Chao Yang,et al. A peta-scalable CPU-GPU algorithm for global atmospheric simulations , 2013, PPoPP '13.
[18] Wei Ge,et al. The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.
[19] Nikolaus A. Adams,et al. 11 PFLOP/s simulations of cloud cavitation collapse , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[20] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[22] Chao Yang,et al. Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2 , 2015, IEEE Transactions on Computers.
[23] Chi-Wang Shu,et al. Strong Stability-Preserving High-Order Time Discretization Methods , 2001, SIAM Rev..
[24] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Mark A. Taylor,et al. Progress towards accelerating HOMME on hybrid multi-core systems , 2013, Int. J. High Perform. Comput. Appl..
[26] Manish Vachharajani,et al. GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[27] P. Lauritzen,et al. Atmospheric Transport Schemes: Desirable Properties and a Semi-Lagrangian View on Finite-Volume Discretizations , 2011 .
[28] John Shalf,et al. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .
[29] Takashi Shimokawabe,et al. 145 TFlops Performance on 3990 GPUs of TSUBAME 2.0 Supercomputer for an Operational Weather Prediction , 2011, ICCS.
[30] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[31] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[32] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[33] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[34] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[35] Chao Yang,et al. 10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Tom Henderson,et al. Running the NIM Next-Generation Weather Model on GPUs , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[37] Christiane Jablonowski,et al. Operator-Split Runge-Kutta-Rosenbrock Methods for Nonhydrostatic Atmospheric Models , 2012 .