Inner array inlining for structure of arrays layout

Previous work has shown how the well-studied and SIMD-friendly Structure of Arrays (SOA) data layout strategy can speed up applications in high-performance computing compared to a traditional Array of Structures (AOS) data layout. However, a standard SOA layout cannot handle structures with inner arrays; such structures appear frequently in graph-based applications and object-oriented designs with associations of high multiplicity. This work extends the SOA data layout to structures with array-typed fields. We present different techniques for inlining (embedding) inner arrays into an AOS or SOA layout, as well as the design and implementation of an embedded C++/CUDA DSL that lets programmers write such layouts in a notation close to standard C++. We evaluate several layout strategies with a traffic flow simulation, an important real-world application in transport planning.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Viera K. Proulx Traffic simulation: a case study for teaching object oriented design , 1998, SIGCSE '98.

[3]  Holger Homann,et al.  SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes , 2017, Comput. Phys. Commun..

[4]  Michael Schreckenberg,et al.  A cellular automaton traffic flow model for online simulation of traffic , 2001, Parallel Comput..

[5]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[6]  Kai Nagel,et al.  Multi-agent traffic simulation with CUDA , 2009, 2009 International Conference on High Performance Computing & Simulation.

[7]  Michael Schreckenberg,et al.  A cellular automaton model for freeway traffic , 1992 .

[8]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[9]  Nuno Faria,et al.  Impact of Data Structure Layout on Performance , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[10]  Thomas Fahringer,et al.  Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.

[11]  Mark J. Harris CUDA: performance tips and tricks , 2007, SIGGRAPH '07.

[12]  M. Schreckenberg,et al.  Microscopic Simulation of Urban Traffic Based on Cellular Automata , 1997 .

[13]  Andrew S. Grimshaw,et al.  High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..

[14]  Robert Strzodka,et al.  Abstraction for AoS and SoA layout in C , 2011 .

[15]  Kai Nagel,et al.  Using common graphics hardware for multi-agent traffic simulation with CUDA , 2009, SimuTools.

[16]  Geoff Boeing,et al.  OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks , 2016, Comput. Environ. Urban Syst..

[17]  Bart De Moor,et al.  Transportation Planning and Traffic Flow Models , 2005 .

[18]  Andrew A. Chien,et al.  An automatic object inlining optimization and its evaluation , 2000, PLDI '00.

[19]  Michael Goesele,et al.  Auto-Tuning Complex Array Layouts for GPUs , 2014, EGPGV@EuroVis.

[20]  Hidehiko Masuhara,et al.  Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout , 2018, WPMVP@PPoPP.

[21]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[22]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[23]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[24]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[25]  Hanspeter Mössenböck,et al.  Automatic array inlining in java virtual machines , 2008, CGO '08.

[26]  Kai Nagel,et al.  Using common graphics hardware for multi-agent traffic simulation with CUDA , 2009, SIMUTools 2009.

[27]  Tarek S. Abdelrahman,et al.  Launch-Time Optimization of OpenCL GPU Kernels , 2017, GPGPU@PPoPP.