Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
暂无分享,去创建一个
[1] Roger P. Pawlowski,et al. Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, Part II: Application to partial differential equations , 2012, Sci. Program..
[2] Daniel Sunderland,et al. Manycore performance-portability: Kokkos multidimensional array library , 2012 .
[3] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[4] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[5] Timothy G. Mattson,et al. Patterns for parallel programming , 2004 .
[6] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[7] Alan B. Williams,et al. A Light-weight API for Portable Multicore Programming , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[8] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.
[9] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[10] Jean-François Méhaut,et al. SGPU-2: a runtime system for using large applications on clusters of hybrid nodes , 2011 .
[11] Edward A. Luke,et al. Loci: a rule-based framework for parallel multi-disciplinary simulation synthesis , 2005, J. Funct. Program..
[12] Vassilios V. Dimakopoulos,et al. HOMPI: A Hybrid Programming Framework for Expressing and Deploying Task-Based Parallelism , 2011, Euro-Par.
[13] Daniel Sunderland,et al. Kokkos Array performance-portable manycore programming model , 2012, PMAM '12.
[14] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[15] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.
[16] Roger P. Pawlowski,et al. Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, Part I: Template-based generic programming , 2012, Sci. Program..
[17] Daniel Sunderland,et al. Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays , 2011, 2011 IEEE International Conference on Cluster Computing.
[18] Ade Miller,et al. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++ , 2012 .
[19] Robert A. van de Geijn,et al. Towards Usable and Lean Parallel Linear Algebra Libraries , 1996 .
[20] James Reinders,et al. Intel® threading building blocks , 2008 .
[21] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[22] Daniel Sunderland,et al. Manycore performance-portability: Kokkos multidimensional array library , 2012, Sci. Program..
[23] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[24] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[25] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..