Implications of a metric for performance portability
暂无分享,去创建一个
[1] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[2] Pradeep Dubey,et al. Can traditional programming bridge the Ninja performance gap for parallel computing applications , 2012, ISCA 2012.
[3] Calvin Lin,et al. Customizing Software Libraries for Performance Portability , 2001, PPSC.
[4] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[5] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[6] Stephen A. Jarvis,et al. An investigation of the performance portability of OpenCL , 2013, J. Parallel Distributed Comput..
[7] Yao Zhang,et al. Improving Performance Portability in OpenCL Programs , 2013, ISC.
[8] G. R. Mudalige,et al. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).
[9] Matt Martineau,et al. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models , 2016, ISC Workshops.
[10] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[11] Stephen A. Jarvis,et al. Developing Performance-Portable Molecular Dynamics Kernels in OpenCL , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[12] Dan Quinlan,et al. The ROSE Source-to-Source Compiler Infrastructure , 2011 .
[13] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[14] Eric Darve,et al. Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[15] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[16] Tobias Gysi,et al. STELLA: a domain-specific tool for structured grid methods in weather and climate models , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Lawrence Mitchell,et al. PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[18] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[19] Alejandro Duran,et al. A Modern Memory Management System for OpenMP , 2016, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD).
[20] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[21] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[22] Reinhold Bader. Extended interoperation with C , 2013, FORF.
[23] Simon McIntosh-Smith,et al. On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.
[24] Stephen A. Jarvis,et al. Achieving Portability and Performance through OpenACC , 2014, 2014 First Workshop on Accelerator Programming using Directives.
[25] Guang R. Gao,et al. Performance portability on EARTH: a case study across several parallel architectures , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[26] Putt Sakdhnagool,et al. Evaluating Performance Portability of OpenACC , 2014, LCPC.
[27] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[28] Victor W. Lee,et al. A Metric for Performance Portability , 2016, ArXiv.
[29] Jean-Sylvain Camier. Improving Performance Portability and Exascale Software Productivity with the ∇ Numerical Programming Language , 2015 .
[30] Sean Rul,et al. An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.
[31] Simon McIntosh-Smith,et al. The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[32] Jason Sewall,et al. From “Correct” to “Correct & Efficient” , 2015 .
[33] Stephen A. Jarvis,et al. Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[34] James E. Smith,et al. Characterizing computer performance with a single number , 1988, CACM.
[35] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Matt Martineau,et al. Assessing the performance portability of modern parallel programming models using TeaLeaf , 2017, Concurr. Comput. Pract. Exp..
[37] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[38] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).