Assessing the Performance of OpenMP Programs on the Intel Xeon Phi
暂无分享,去创建一个
Dirk Schmidl | Christian Terboven | Matthias S. Müller | Sandra Wienke | Tim Cramer | C. Terboven | Dirk Schmidl | Sandra Wienke | T. Cramer | Tim Cramer
[1] Hermann Ney,et al. Features for image retrieval: an experimental comparison , 2008, Information Retrieval.
[2] Dirk Schmidl,et al. Task-Parallel Programming on NUMA Architectures , 2012, Euro-Par.
[3] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[4] Christian Brecher,et al. Simulation of bevel gear cutting with GPGPUs—performance and productivity , 2011, Computer Science - Research and Development.
[5] Dirk Schmidl,et al. Assessing OpenMP Tasking Implementations on NUMA Architectures , 2012, IWOMP.
[6] Matthias S. Müller,et al. OpenMP in a Heterogeneous World , 2012, Lecture Notes in Computer Science.
[7] Dirk Schmidl,et al. Data and thread affinity in openmp programs , 2008, MAW '08.
[8] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[9] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[10] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[11] Mikhail Smelyanskiy,et al. Efficient backprojection-based synthetic aperture radar computation with many-core processors , 2012, HiPC 2012.
[12] Malcolm P. Atkinson,et al. An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications , 2012, Euro-Par.
[13] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[14] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[15] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] C. Bischof,et al. Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[17] Michael Klemm,et al. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.
[18] H. Martin Bücker,et al. Parallel Minimum p-Norm Solution of the Neuromagnetic Inverse Problem for Realistic Signals Using Exact Hessian-Vector Products , 2008, SIAM J. Sci. Comput..
[19] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[20] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[21] J. M. Bull,et al. Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .