Automatic Data Layout Optimizations for GPUs

Memory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, or transforming Array-of-Structures (AoS) to Structure-of-Arrays (SoA).

[1]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[2]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .

[3]  Robert Strzodka Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..

[4]  Michael Goesele,et al.  Auto-Tuning Complex Array Layouts for GPUs , 2014, EGPGV@EuroVis.

[5]  Kevin Skadron,et al.  Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[7]  Sandya Mannarswamy,et al.  Structure Layout Optimization for Multithreaded Programs , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[8]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[9]  Mahmut T. Kandemir,et al.  A framework for interprocedural locality optimization using both loop and data layout transformations , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[10]  Kevin Skadron,et al.  Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[11]  Sandra Gesing,et al.  SAMPO: an agent-based mosquito point model in OpenCL , 2014, SpringSim.

[12]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[13]  Geng Liu,et al.  Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems , 2012, Computer.