Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
暂无分享,去创建一个
Bo Wu | Xipeng Shen | Eddy Z. Zhang | Xipeng Shen | E. Zhang | Bo Wu
[1] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .
[2] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[3] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.
[4] Keshav Pingali,et al. Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.
[5] John Mellor-Crummey,et al. Managing locality in grand challenge applications: a case study of the gyrokinetic toroidal code , 2008 .
[6] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Chau-Wen Tseng,et al. Improving Locality for Adaptive Irregular Scientific Codes , 2000, LCPC.
[8] Rainald Löhner,et al. Running unstructured grid‐based CFD solvers on modern graphics hardware , 2011 .
[9] Larry Carter,et al. Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.
[10] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[11] Xipeng Shen,et al. Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation , 2011, LCPC.
[12] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.
[13] Dimitri J. Mavriplis,et al. The design and implementation of a parallel unstructured Euler solver using software primitives , 1992 .
[14] Kwang-Moo Choe,et al. Region-based parallelization of irregular reductions on explicitly managed memory hierarchies , 2009, The Journal of Supercomputing.
[15] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[16] Shahid H. Bokhari,et al. A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.
[17] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[18] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[19] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[20] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[21] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.
[22] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global cache reuse , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[23] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[24] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.
[25] Xipeng Shen,et al. Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[26] A. H. Sherman,et al. Comparative Analysis of the Cuthill–McKee and the Reverse Cuthill–McKee Ordering Algorithms for Sparse Matrices , 1976 .
[27] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[28] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[29] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.