Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators
暂无分享,去创建一个
[1] Jim Jeffers. Intel® Xeon Phi™ Coprocessors , 2013 .
[2] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[3] Hiroaki Kobayashi,et al. Automatic Tuning of CUDA Execution Parameters for Stencil Processing , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.
[4] Roman Wyrzykowski,et al. Parallel Implementation of Conjugate Gradient Method on Graphics Processors , 2009, PPAM.
[5] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[6] Alfonso Niño,et al. A Survey of Parallel Programming Models and Tools in the Multi and Many-core Era , 2022 .
[7] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[8] Lukasz Szustak,et al. Using Blue Gene/P and GPUs to Accelerate Computations in the EULAG Model , 2011, LSSC.
[9] Piotr K. Smolarkiewicz,et al. FORWARD-IN-TIME DIFFERENCING FOR FLUIDS: SIMULATION OF GEOPHYSICAL TURBULENCE , 2002 .
[10] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[11] Matthias Christen,et al. Generating and auto-tuning parallel stencil codes , 2011 .
[12] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[13] Cédric Augonnet,et al. Data-Aware Task Scheduling on Multi-accelerator Based Platforms , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.
[14] José María Cela,et al. Introducing the Semi-stencil Algorithm , 2009, PPAM.
[15] Lukasz Szustak,et al. Model-driven adaptation of double-precision matrix multiplication to the Cell processor architecture , 2012, Parallel Comput..
[16] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[17] Piotr K. Smolarkiewicz,et al. Multidimensional positive definite advection transport algorithm: an overview , 2006 .
[18] Mikolaj Dobski,et al. Parallel and GPU Based Strategies for Selected CFD and Climate Modeling Models , 2011, ITEE.
[19] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[20] Lukasz Szustak,et al. Parallelization of EULAG Model on Multicore Architectures with GPU Accelerators , 2011, PPAM.
[21] Chris H. Q. Ding,et al. A ghost cell expansion method for reducing communications in solving PDE problems , 2001, SC.
[22] Jack Dongarra,et al. Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.
[23] A. R. Surve,et al. Energy Awareness in HPC: A Survey , 2013 .
[24] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[25] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[26] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] W. Grabowski,et al. The multidimensional positive definite advection transport algorithm: nonoscillatory option , 1990 .
[28] Gerhard Wellein,et al. Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters , 2010, Parallel Process. Lett..
[29] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[30] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[31] Michal Kierzynka,et al. CaKernel --A parallel application programming framework for heterogenous computing architectures , 2011 .
[32] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[33] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[34] Manuel Jesús Castro Díaz,et al. A multi‐GPU shallow‐water simulation with transport of contaminants , 2013, Concurr. Comput. Pract. Exp..
[35] Piotr K. Smolarkiewicz,et al. Towards petascale simulation of atmospheric circulations with soundproof equations , 2011 .
[36] Roberto Guerrieri,et al. Triangular Matrix Inversion on Heterogeneous Multicore Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.
[37] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[38] Bogdan Rosa,et al. A Study on Parallel Performance of the EULAG F90/95 Code , 2011, PPAM.
[39] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[40] Samuel Williams,et al. Auto-Tuning Stencil Computations on Multicore and Accelerators , 2010, Scientific Computing with Multicore and Accelerators.
[41] Andrzej A. Wyszogrodzki,et al. Parallel Implementation and Scalability of Cloud Resolving EULAG Model , 2011, PPAM.
[42] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[43] L. Margolin,et al. MPDATA: A Finite-Difference Solver for Geophysical Flows , 1998 .
[44] Jack J. Dongarra,et al. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..