CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator
暂无分享,去创建一个
[1] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[2] Michael F. P. O'Boyle,et al. The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..
[3] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[4] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[5] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[6] Chi-Bang Kuan,et al. Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.
[7] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[8] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[9] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[10] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[11] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[12] Vikram S. Adve,et al. Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.
[13] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[14] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[15] Tom Davis,et al. Opengl programming guide: the official guide to learning opengl , 1993 .