Locality Optimization of Stencil Applications Using Data Dependency Graphs
暂无分享,去创建一个
[1] Guang R. Gao,et al. Optimized Dense Matrix Multiplication on a Many-Core Architecture , 2010, Euro-Par.
[2] Guang R. Gao,et al. Diamond Tiling: A Tiling Framework for Time-iterated Scientic Applications , 2009 .
[3] Guang R. Gao,et al. Mapping the LU decomposition on a many-core architecture: challenges and solutions , 2009, CF '09.
[4] K. Yee. Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media , 1966 .
[5] Guang R. Gao,et al. Mapping the FDTD Application to Many-Core Chip Architectures , 2009, 2009 International Conference on Parallel Processing.
[6] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[7] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[8] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.
[9] Paul Feautrier,et al. Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.
[10] J. Ramanujam,et al. Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..
[11] Alain Darte,et al. The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications , 1996 .
[12] Sanjay V. Rajopadhye. Dependence Analysis and Parallelizing Transformations , 2002, The Compiler Design Handbook.
[13] Domenico Talia,et al. Euro-Par 2010 - Parallel Processing , 2010, Lecture Notes in Computer Science.
[14] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[15] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .
[16] Guang R. Gao,et al. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).
[17] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.