论文信息 - Manycore performance-portability: Kokkos multidimensional array library

Manycore performance-portability: Kokkos multidimensional array library

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].

[1] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[2] David Abrahams,et al. C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ In-Depth Series) , 2004 .

[3] James Reinders,et al. Intel® threading building blocks , 2008 .

[4] B. J. Epstein. List of Tables xi , 2012 .

[5] Todd L. Veldhuizen,et al. Arrays in Blitz++ , 1998, ISCOPE.