User-Specified and Automatic Data Layout Selection for Portable Performance
暂无分享,去创建一个
[1] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[2] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.
[3] Uday Bondhugula,et al. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[4] Sandya Mannarswamy,et al. Structure Layout Optimization for Multithreaded Programs , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[5] Mahmut T. Kandemir,et al. Improving whole-program locality using intra-procedural and inter-procedural transformations, , 2005, J. Parallel Distributed Comput..
[6] Mahmut T. Kandemir,et al. A framework for interprocedural locality optimization using both loop and data layout transformations , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[7] T. Jones,et al. TALC: A Simple C Language Extension For Improved Performance and Code Maintainability , 2007 .
[8] Ian Karlin,et al. Tuning the LULESH Mini-app for Current and Future Hardware , 2013 .
[9] Rudolf Eigenmann,et al. Compiler Infrastructure , 2013, International Journal of Parallel Programming.
[10] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[11] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global cache reuse , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[12] Balaram Sinharoy,et al. IBM POWER7 multicore server processor , 2011 .
[13] Michael F. P. O'Boyle,et al. Efficient parallelisation using combined loop and data transformations , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[14] Ken Kennedy,et al. Inter-array Data Regrouping , 1999, LCPC.
[15] Guojing Cong,et al. Application data prefetching on the IBM Blue Gene/Q supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Trishul M. Chilimbi,et al. Cache-conscious coallocation of hot data streams , 2006, PLDI '06.
[17] Mahmut T. Kandemir,et al. Optimizing Data Layouts for Parallel Computation on Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.