A Multilevel Parallelization Framework for High-Order Stencil Computations
暂无分享,去创建一个
Weiqiang Wang | Liu Peng | Rajiv K. Kalia | Priya Vashishta | Aiichiro Nakano | Hikmet Dursun | Ken-ichi Nomura | Richard Seymour | A. Nakano | R. Kalia | P. Vashishta | K. Nomura | Liu Peng | Weiqiang Wang | Richard Seymour | Hikmet Dursun
[1] Allen Taflove,et al. Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .
[2] Yves Robert,et al. Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[3] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[4] Rainer Bleck,et al. Salinity-driven Thermocline Transients in a Wind- and Thermohaline-forced Isopycnic Coordinate Model of the North Atlantic , 1992 .
[5] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[6] Yousef Saad,et al. Parallel methods and tools for predicting material properties , 2000, Comput. Sci. Eng..
[7] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[8] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[9] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[10] Linda G. Shapiro,et al. Computer and Robot Vision , 1991 .
[11] J. Dongarra,et al. The Impact of Multicore on Computational Science Software , 2007 .
[12] Patricia J. Teller,et al. Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.
[13] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[14] Guy L. Steele,et al. Fortran at ten gigaflops: the connection machine convolution compiler , 1991, PLDI '91.
[15] Scott Pakin,et al. Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.
[16] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .
[17] Louis Turcotte,et al. Proceedings of the 2000 ACM/IEEE conference on Supercomputing , 2000 .
[18] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Samuel Williams,et al. Lattice Boltzmann simulation optimization on leading multicore platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Scott Pakin. Receiver-initiated message passing over RDMA Networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[21] Melinda Piket-May,et al. 9 – Computational Electromagnetics: The Finite-Difference Time-Domain Method , 2005 .
[22] A. Nakano,et al. Divide-and-conquer density functional theory on hierarchical real-space grids: Parallel implementation and applications , 2008 .
[23] Ali-Reza Adl-Tabatabai,et al. Proceedings of the 2006 workshop on Memory System Performance and Correctness, San Jose, California, USA, October 11, 2006 , 2006, Memory System Performance and Correctness.
[24] William Kramer,et al. Proceedings of the 2005 ACM/IEEE conference on Supercomputing , 2005 .
[25] A. Nakano,et al. Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers , 1994 .