Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout

Structure of Arrays (SOA) is a well-studied data layout technique for SIMD architectures. Previous work has shown that it can speed up applications in high-performance computing by several factors compared to a traditional Array of Structures (AOS) layout. However, most programmers are used to AOS-style programming, which is more readable and easier to maintain. We present Ikra-Cpp, an embedded DSL for object-oriented programming in C++/CUDA. Ikra-Cpp's notation is very close to standard AOS-style C++ code, but data is layed out as SOA. This gives programmers the performance benefit of SOA and the expressiveness of AOS-style object-oriented programming at the same time. Ikra-Cpp is well integrated with C++ and lets programmers use C++ notation and syntax for classes, fields, member functions, constructors and instance creation.

[1]  Michael Goesele,et al.  Auto-Tuning Complex Array Layouts for GPUs , 2014, EGPGV@EuroVis.

[2]  Jianlong Zhong,et al.  Parallel Graph Processing on Graphics Processors Made Easy , 2013, Proc. VLDB Endow..

[3]  Kunle Olukotun,et al.  Delite , 2014, ACM Trans. Embed. Comput. Syst..

[4]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[5]  Kunle Olukotun,et al.  A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[6]  Raffaele Tripiccione,et al.  Massively parallel lattice-Boltzmann codes on large GPU clusters , 2016, Parallel Comput..

[7]  Ingo Wald,et al.  Extending a C-like language for portable SIMD programming , 2012, PPoPP '12.

[8]  Robert Strzodka Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..

[9]  Bjarne Stroustrup Foundations of C++ , 2012, ESOP.

[10]  Kunle Olukotun,et al.  Building-Blocks for Performance Oriented DSLs , 2011, DSL.

[11]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[12]  Gang Mei,et al.  Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation , 2016, SpringerPlus.

[13]  Sebastian Hack,et al.  Sierra: a SIMD extension for C++ , 2014, WPMVP '14.

[14]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[15]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[16]  Bertrand Meyer,et al.  Object-oriented software construction (2nd ed.) , 1997 .

[17]  David R. Kaeli,et al.  Data Structures and Transformations for Physically Based Simulation on a GPU , 2010, VECPAR.

[18]  Philip Levis,et al.  Ebb: A DSL for Physical Simluation on CPUs and GPUs , 2015, ACM Trans. Graph..

[19]  OlukotunKunle,et al.  A domain-specific approach to heterogeneous parallelism , 2011 .

[20]  Xiaoming Li,et al.  CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator , 2009, 2009 International Conference on Parallel Processing Workshops.

[21]  Holger Homann,et al.  SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes , 2017, Comput. Phys. Commun..

[22]  Robert Hirschfeld,et al.  Columnar objects: improving the performance of analytical applications , 2015, Onward!.

[23]  S. Tucker Taft,et al.  Information technology — Programming Languages — Ada , 2001 .

[24]  Kunle Olukotun,et al.  Implementing Domain-Specific Languages for Heterogeneous Parallel Computing , 2011, IEEE Micro.

[25]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[26]  Peng Tu,et al.  Writing scalable SIMD programs with ISPC , 2014, WPMVP '14.

[27]  Hidehiko Masuhara,et al.  Object support in an array-based GPGPU extension for Ruby , 2016, ARRAY@PLDI.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[29]  Viera K. Proulx Traffic simulation: a case study for teaching object oriented design , 1998, SIGCSE '98.

[30]  Paul Hudak,et al.  Modular domain specific languages and tools , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[31]  Benedikt Stefansson Simulating Economic Agents in Swarm , 2000 .

[32]  Laurent Philippe,et al.  A survey on parallel and distributed multi-agent systems for high performance computing simulations , 2016, Comput. Sci. Rev..

[33]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[34]  Jonathan W. Berry,et al.  Graph Analysis with High-Performance Computing , 2008, Computing in Science & Engineering.

[35]  James O. Coplien,et al.  Curiously recurring template patterns , 1995 .

[36]  Robert Strzodka,et al.  Abstraction for AoS and SoA layout in C , 2011 .

[37]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[38]  Dirk Helbing,et al.  Social self-organization : agent-based simulations and experiments to study emergent social behavior , 2012 .

[39]  Daniela M. Romano,et al.  A high performance agent based modelling framework on graphics card hardware with CUDA , 2009, AAMAS.

[40]  Dirk Helbing,et al.  Agent-Based Modeling , 2012 .

[41]  Thomas Fahringer,et al.  Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.