Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications
暂无分享,去创建一个
[1] David Daly,et al. The cache and memory subsystems of the IBM POWER8 processor , 2015, IBM J. Res. Dev..
[2] Jeff R. Hammond,et al. User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies , 2015 .
[3] Jeffrey K. Hollingsworth,et al. Using Hardware Counters to Automatically Improve Memory Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[4] Thomas R. Gross,et al. Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead , 2011, ISMM '11.
[5] Gerhard Wellein,et al. LIKWID: Lightweight Performance Tools , 2011, CHPC.
[6] Jean-François Méhaut,et al. NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Philippe Olivier Alexandre Navaux,et al. Multi-core aware process mapping and its impact on communication overhead of parallel applications , 2009, 2009 IEEE Symposium on Computers and Communications.
[8] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.
[9] Jack J. Dongarra,et al. Analytical modeling and optimization for affinity based thread scheduling on multicore systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[10] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[11] Christoph Lameter,et al. NUMA (Non-Uniform Memory Access): An Overview , 2013, ACM Queue.
[12] Alistair P. Rendell,et al. OpenMP and NUMA Architectures I: Investigating Memory Placement on the SCI Origin 3000 , 2003, International Conference on Computational Science.
[13] Arun Jagatheesan,et al. Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] François Pellegrini,et al. Scotch and libScotch 5.0 User's Guide , 2007 .
[15] William J. Bowhill,et al. The Xeon® Processor E5-2600 v3: a 22 nm 18-Core Product Family , 2016, IEEE Journal of Solid-State Circuits.
[16] Juan Touriño,et al. Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite , 2012, Comput. Electr. Eng..
[17] Jeffrey M. Squyres,et al. Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface , 2013, EuroMPI.
[18] William J. Bowhill,et al. 4.5 The Xeon® processor E5-2600 v3: A 22nm 18-core product family , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[19] Dong Li,et al. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[20] Jeffrey K. Hollingsworth,et al. Hardware monitors for dynamic page migration , 2008, J. Parallel Distributed Comput..
[21] Tim Brecht,et al. On the importance of parallel application placement in NUMA multiprocessors , 1993 .
[22] Emmanuel Jeannot,et al. Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques , 2014, IEEE Transactions on Parallel and Distributed Systems.
[23] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[24] Sadaf R. Alam,et al. Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.
[25] Brice Goglin,et al. ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures , 2010, International Journal of Parallel Programming.
[26] Simon David Hammond,et al. memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. , 2015 .
[27] Informatika. Distributed Management Task Force , 2010 .
[28] Rui Yang,et al. Memory and Thread Placement Effects as a Function of Cache Usage: A Study of the Gaussian Chemistry Code on the SunFire X4600 M2 , 2008, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008).
[29] Mark Giampapa,et al. Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Masha Sosonkina,et al. Non-uniform Memory Affinity Strategy in Multi-Threaded Sparse Matrix Computations , 2011, HiPC 2012.