A Case Study of User-Defined Code Transformations for Data Layout Optimizations

This paper reports a case study of using the Xevolver code transformation framework for data layout optimizations of high-performance computing (HPC) applications. Due to the variety of data structures used in individual applications, a code transformation rule for data layout optimizations is generally specific to a particular application. Since the Xevolver framework enables users to define their own code transformations, a custom code transformation can be defined so that a specific data representation in an existing code can mechanically and consistently be translated to another one. Our evaluation results clearly demonstrate that such a code transformation is effective to improve memory access efficiency and hence the performance of an HPC application without overcomplicating the code.

[1]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[2]  Hiroaki Kobayashi,et al.  A High-Level Interface of Xevolver for Composing Loop Transformations , 2015 .

[3]  Benoît Meister,et al.  Automatic memory layout transformations to optimize spatial locality in parameterized loop nests , 2000, CARN.

[4]  Robert Strzodka Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..

[5]  Satoshi Matsuoka,et al.  CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[6]  Wen-mei W. Hwu,et al.  DL: A data layout transformation system for heterogeneous computing , 2012, 2012 Innovative Parallel Computing (InPar).

[7]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[8]  Yu Zhang,et al.  Data-Layout Optimization Using Reuse Distance Distribution , 2006, EUC Workshops.

[9]  D. Qainlant,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .

[10]  Kevin Skadron,et al.  Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Uday Bondhugula,et al.  Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.

[12]  Thomas Fahringer,et al.  Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.

[13]  C. Michael Sperberg-McQueen,et al.  Extensible Markup Language (XML) Version 1.0 , 2000 .

[14]  Chris Lattner,et al.  LLVM: AN INFRASTRUCTURE FOR MULTI-STAGE OPTIMIZATION , 2000 .

[15]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[16]  Adrian Jackson,et al.  The EPCC OpenACC Benchmark Suite , 2013 .

[17]  Hiroaki Kobayashi,et al.  Xevolver: An XML-based code translation framework for supporting HPC application migration , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[18]  Michael Kay XSLT 2.0 and XPath 2.0 Programmer's Reference (Programmer to Programmer) , 2008 .