Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling
暂无分享,去创建一个
Larry Carter | Jonathan Freeman | Jeanne Ferrante | Michelle Mills Strout | Barbara Kreaseck | L. Carter | J. Ferrante | M. Strout | Barbara Kreaseck | J. Freeman
[1] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[2] David G. Wonnacott,et al. Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.
[3] Siddhartha Chatterjee,et al. Cache-Efficient Multigrid Algorithms , 2004, Int. J. High Perform. Comput. Appl..
[4] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[5] Mark F. Adams,et al. Evaluation of three unstructured multigrid methods on 3D finite element problems in solid mechanics , 2000 .
[6] William Pugh,et al. Iteration Space Slicing for Locality , 1999, LCPC.
[7] Joel H. Saltz,et al. Run-time and compile-time support for adaptive irregular problems , 1994, Proceedings of Supercomputing '94.
[8] Dawson R. Engler,et al. Interface Compilation: Steps Toward Compiling Program Interfaces as Languages , 1999, IEEE Trans. Software Eng..
[9] David A. Padua,et al. MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.
[10] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[11] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[12] D. Qainlant,et al. ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .
[13] Robert J. Fowler,et al. Increasing Temporal Locality with Skewing and Recursive Blocking , 2001, International Conference on Software Composition.
[14] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[15] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[16] V. E. Henson,et al. BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .
[17] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[18] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[19] Ken Kennedy,et al. Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization , 2001, ICS '01.
[20] Kei Davis,et al. Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures , 1998, ISCOPE.
[21] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[22] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[23] Barry F. Smith,et al. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .
[24] Siddhartha Chatterjee,et al. Cache-Efficient Multigrid Algorithms , 2001, Int. J. High Perform. Comput. Appl..
[25] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[26] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[27] Mark F. Adams. A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers , 2001, SC.
[28] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[29] Calvin Lin,et al. Customizing Software Libraries for Performance Portability , 2001, PPSC.
[30] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[31] M. J. Hagger. Automatic domain decomposition on unstructured grids (DOUG) , 1998, Advances in Computational Mathematics.
[32] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[33] Daniel J. Quinlan. ROSE: Compiler Support for Object-Oriented Frameworks , 2000, Parallel Process. Lett..
[34] Chau-Wen Tseng,et al. A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.
[35] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.