Automatic Parameter Tuning of Three-Dimensional Tiled FDTD Kernel
暂无分享,去创建一个
Hiroshi Nakashima | Takeshi Iwashita | Tasuku Hiraishi | Takeshi Minami | Motoharu Hibino | Tasuku Hiraishi | T. Iwashita | M. Hibino | H. Nakashima | Takeshi Minami
[1] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[2] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .
[3] David V. Thiel,et al. FDTD analysis of dielectric-embedded electronically switched multiple-beam (DE-ESMB) antenna array , 2002 .
[4] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Hans-Peter Seidel,et al. Cache oblivious parallelograms in iterative stencil computations , 2010, ICS '10.
[7] Vivek Sarkar,et al. Analytical Bounds for Optimal Tile Size Selection , 2012, CC.
[8] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Satoshi Matsuoka,et al. A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[10] Gerhard Wellein,et al. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[11] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[13] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[14] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[15] G. Ala,et al. Numerical simulation of radiated EMI in 42 V electrical automotive architectures , 2006, IEEE Transactions on Magnetics.
[16] Guang R. Gao,et al. Mapping the FDTD Application to Many-Core Chip Architectures , 2009, 2009 International Conference on Parallel Processing.
[17] Vincent Fusco,et al. A parallel implementation of the finite difference time‐domain algorithm , 1995 .
[18] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[19] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.