Optimizing memory affinity with a hybrid compiler/OS approach
暂无分享,去创建一个
Philippe O. A. Navaux | Matthias Diener | Eduardo H. M. Cruz | Edson Borin | Marco A. Z. Alves | E. Borin | P. Navaux | M. Diener | E. Cruz | M. Alves
[1] Philippe Olivier Alexandre Navaux,et al. Kernel-Based Thread and Data Mapping for Improved Memory Affinity , 2016, IEEE Transactions on Parallel and Distributed Systems.
[2] Frank Mueller,et al. Hardware profile-guided automatic page placement for ccNUMA systems , 2006, PPoPP '06.
[3] Jean-François Méhaut,et al. Memory Affinity for Hierarchical Shared Memory Multiprocessors , 2009, 2009 21st International Symposium on Computer Architecture and High Performance Computing.
[4] Frank Mueller,et al. Feedback-directed page placement for ccNUMA via hardware-generated memory traces , 2010, J. Parallel Distributed Comput..
[5] Christoph Lameter,et al. An overview of non-uniform memory access , 2013, CACM.
[6] Philippe Olivier Alexandre Navaux,et al. kMAF: Automatic kernel-level management of thread and data affinity , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[7] Sverker Holmgren,et al. affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system , 2005, ICS '05.
[8] Wei Wang,et al. Performance analysis of thread mappings with a holistic view of the hardware resources , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[9] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[10] Philippe Olivier Alexandre Navaux,et al. Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[11] Philippe Olivier Alexandre Navaux,et al. LAPT: A locality-aware page table for thread and data mapping , 2016, Parallel Comput..
[12] Thomas R. Gross,et al. Matching memory access patterns and data placement for NUMA systems , 2012, CGO '12.
[13] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[14] Philippe Olivier Alexandre Navaux,et al. An Efficient Algorithm for Communication-Based Task Mapping , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[15] Philippe Olivier Alexandre Navaux,et al. Characterizing communication and page usage of parallel applications for thread and data mapping , 2015, Perform. Evaluation.
[16] Philippe Olivier Alexandre Navaux,et al. Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures , 2016, ACM Trans. Archit. Code Optim..
[17] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[18] Jean-François Méhaut,et al. Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas , 2010, VECPAR.
[19] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[20] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[21] Ricardo Bianchini,et al. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[22] David W. Nellans,et al. Handling the problems and opportunities posed by multiple on-chip memory controllers , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[23] Fernando Magno Quintão Pereira,et al. Compiler support for selective page migration in NUMA architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[24] Jeffrey K. Hollingsworth,et al. Using Hardware Counters to Automatically Improve Memory Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[25] Jeffrey K. Hollingsworth,et al. Hardware monitors for dynamic page migration , 2008, J. Parallel Distributed Comput..
[26] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Carla Schlatter Ellis,et al. An analysis of dynamic page placement on a NUMA multiprocessor , 1992, SIGMETRICS '92/PERFORMANCE '92.
[28] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[29] Eduard Ayguadé,et al. UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors , 2000, LCR.