Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
暂无分享,去创建一个
José L. Abellán | Manuel E. Acacio | José M. García | Manuel Ujaldon | José M. Cecilia | Juan Fernández Peinador
[1] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[2] Bruce P. Lester. The art of parallel programming , 1993 .
[3] Patricia J. Teller,et al. Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.
[4] Scott Lathrop,et al. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.
[5] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[6] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[7] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[8] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[9] Rodrigo Weber dos Santos,et al. Comparing CUDA and OpenGL implementations for a Jacobi iteration , 2009, 2009 International Conference on High Performance Computing & Simulation.
[10] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[11] José L. Abellán,et al. Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades , 2008, ICCS.
[12] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] M.D. McCool,et al. Scalable Programming Models for Massively Multicore Processors , 2008, Proceedings of the IEEE.
[14] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[15] Wang Gui-bin,et al. Optimizing stencil application on multi-thread GPU architecture using stream programming model , 2010, ARCS 2010.
[16] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[17] Manuel E. Acacio,et al. Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades , 2009, Euro-Par.
[18] Peter Messmer,et al. Parallel data-locality aware stencil computations on modern micro-architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[19] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[20] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[21] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.