OpenMP scalability limits on large SMPs and how to extend them
暂无分享,去创建一个
[1] Willy Zwaenepoel,et al. OpenMP on Networks of Workstations , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[2] Dirk Schmidl,et al. Evaluating OpenMP Performance on Thousands of Cores on the Numascale Architecture , 2015, PARCO.
[3] William Gropp,et al. Locality-Optimized Mixed Static/Dynamic Scheduling for Improving Load Balancing on SMPs , 2014, EuroMPI/ASIA.
[4] J. Mark Bull,et al. A microbenchmark suite for OpenMP 2.0 , 2001, CARN.
[5] Eduard Ayguadé,et al. Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration , 2000, ISHPC.
[6] Michael Klemm,et al. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.
[7] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[8] Dirk Schmidl,et al. Binding Nested OpenMP Programs on Hierarchical Memory Architectures , 2010, IWOMP.
[9] Brice Goglin,et al. Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective , 2009, IWOMP.
[10] Fiona Reid,et al. A Microbenchmark Suite for OpenMP Tasks , 2012, IWOMP.
[11] Georg Hager,et al. Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.
[12] Martin Oberlack,et al. Extensive strain along gradient trajectories in the turbulent kinetic energy field , 2011 .
[13] Barbara M. Chapman,et al. A Runtime Implementation of OpenMP Tasks , 2011, IWOMP.
[14] Dirk Schmidl,et al. An OpenMP Extension Library for Memory Affinity , 2014, IWOMP.
[15] Haoqiang Jin,et al. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster , 2003 .
[16] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[17] Lipo Wang,et al. Dissipation element analysis of scalar fields in turbulence , 2006 .
[18] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[19] William Gropp,et al. Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[20] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[21] Dirk Schmidl,et al. First Experiences with Intel Cluster OpenMP , 2008, IWOMP.
[22] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[23] Jonathan Harris,et al. Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[24] Christine Morin,et al. A Case for Single System Image Cluster Operating Systems: The Kerrighed Approach , 2003, Parallel Process. Lett..
[25] Matthias S. Müller,et al. SPEC OMP2012 - An Application Benchmark Suite for Parallel Systems Using OpenMP , 2012, IWOMP.
[26] Eduard Ayguadé,et al. Exploiting multiple levels of parallelism in OpenMP: a case study , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[27] Dirk Schmidl,et al. How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP , 2010, IWOMP.
[28] Anthony Skjellum,et al. Using MPI: portable parallel programming with the message-passing interface, 2nd Edition , 1999, Scientific and engineering computation series.
[29] Dirk Schmidl,et al. Score-P: A Unified Performance Measurement System for Petascale Applications , 2010, CHPC.
[30] Christine Morin,et al. Towards an efficient single system image cluster operating system , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..
[31] Bernd Mohr,et al. Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.
[32] Dirk Schmidl,et al. Suitability of Performance Tools for OpenMP Task-Parallel Programs , 2013, Parallel Tools Workshop.
[33] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.
[34] Christian Terboven,et al. The Design of OpenMP Thread Affinity , 2012, IWOMP.
[35] Dirk Schmidl,et al. Data and thread affinity in openmp programs , 2008, MAW '08.
[36] Christine Morin,et al. Kerrighed: A SSI Cluster OS Running OpenMP , 2003 .
[37] Dirk Schmidl,et al. Visualization of Memory Access Behavior on Hierarchical NUMA Architectures , 2014, 2014 First Workshop on Visual Performance Analysis.
[38] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[39] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[40] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[41] Wolfgang E. Nagel,et al. VAMPIR: Visualization and Analysis of MPI Resources , 2010 .
[42] Lisa Noordergraaf,et al. Performance experiences on Sun's Wildfire prototype , 1999, SC '99.
[43] Dirk Schmidl,et al. Performance Analysis Techniques for Task-Based OpenMP Applications , 2012, IWOMP.
[44] Bernd Mohr,et al. The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..
[45] Bo Wang,et al. Evaluating the Energy Consumption of OpenMP Applications on Haswell Processors , 2015, IWOMP.
[46] Dirk Schmidl,et al. Scaling OpenMP Programs to Thousand Cores on the Numascale Architecture , 2014 .
[47] Dirk Schmidl,et al. Assessing OpenMP Tasking Implementations on NUMA Architectures , 2012, IWOMP.
[48] Thomas Bemmerl,et al. Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures , 2009, PPAM.
[49] Dirk Schmidl,et al. How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture , 2010, 2010 IEEE International Conference on Cluster Computing.
[50] Mitsuhisa Sato,et al. Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system , 2001, Sci. Program..
[51] Alejandro Duran,et al. Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.
[52] Dirk Schmidl,et al. Task-Parallel Programming on NUMA Architectures , 2012, Euro-Par.
[53] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[54] Dirk Schmidl,et al. Trajectory-Search on ScaleMP's vSMP Architecture , 2011, International Conference on Parallel Computing.
[55] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[56] Dirk Schmidl,et al. Performance Characteristics of Large SMP Machines , 2013, IWOMP.
[57] Dirk Schmidl,et al. Towards a Performance Engineering Workflow for OpenMP 4.0 , 2013, PARCO.
[58] Sverker Holmgren,et al. affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system , 2005, ICS '05.
[59] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[60] Amnon Barak,et al. The MOSIX Distributed Operating System: Load Balancing for UNIX , 1993 .
[61] Torsten Hoefler,et al. Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).