Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures
暂无分享,去创建一个
Philippe Olivier Alexandre Navaux | Laércio Lima Pilla | Matthias Diener | Eduardo H. M. Cruz | P. Navaux | M. Diener | E. Cruz | L. Pilla
[1] Jean-François Méhaut,et al. Memory Affinity for Hierarchical Shared Memory Multiprocessors , 2009, 2009 21st International Symposium on Computer Architecture and High Performance Computing.
[2] Dirk Schmidl,et al. Data and thread affinity in openmp programs , 2008, MAW '08.
[3] Frank Mueller,et al. Feedback-directed page placement for ccNUMA via hardware-generated memory traces , 2010, J. Parallel Distributed Comput..
[4] Alessandro Pellegrini,et al. OS-Based NUMA Optimization: Tackling the Case of Truly Multi-thread Applications with Non-partitioned Virtual Page Accesses , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[5] Yurii A. Vlasov,et al. Technologies for exascale systems , 2011, IBM J. Res. Dev..
[6] Fernando Magno Quintão Pereira,et al. Compiler support for selective page migration in NUMA architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[7] Simon W. Moore,et al. A communication characterisation of Splash-2 and Parsec , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[8] Kunle Olukotun,et al. The Future of Microprocessors , 2005, ACM Queue.
[9] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[10] Philippe Olivier Alexandre Navaux,et al. An Efficient Algorithm for Communication-Based Task Mapping , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[11] Hermann Lederer,et al. Parallel Computing: From Multicores and GPU's to Petascale , 2010 .
[12] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[13] Antonio Robles,et al. Increasing the Effectiveness of Directory Caches by Avoiding the Tracking of Noncoherent Memory Blocks , 2013, IEEE Transactions on Computers.
[14] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[15] Josep Torrellas. Architectures for Extreme-Scale Computing , 2009, Computer.
[16] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux , 2010 .
[17] L PillaLaércio,et al. Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures , 2016 .
[18] Jean Roman,et al. Exploiting Intensive Multithreading for the Efficient Simulation of 3D Seismic Wave Propagation , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.
[19] Manuel Prieto,et al. Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.
[20] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[21] Francisco J. Cazorla,et al. Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors , 2013, IEEE Transactions on Parallel and Distributed Systems.
[22] Sverker Holmgren,et al. affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system , 2005, ICS '05.
[23] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[24] Milo M. K. Martin,et al. Why on-chip cache coherence is here to stay , 2012, Commun. ACM.
[25] Jean-François Méhaut,et al. Parallel simulations of seismic wave propagation on NUMA architectures , 2009, PARCO.
[26] Frank Mueller,et al. Hardware profile-guided automatic page placement for ccNUMA systems , 2006, PPoPP '06.
[27] Michael Frumkin,et al. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .
[28] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[29] Philippe Olivier Alexandre Navaux,et al. Communication-aware process and thread mapping using online communication detection , 2015, Parallel Comput..
[30] Michael Ott,et al. autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems , 2011, Trans. High Perform. Embed. Archit. Compil..
[31] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[32] Anoop Gupta,et al. OS Support for Improving Data Locality on CC-NUMA Compute Servers , 1996 .
[33] Oded Lempel,et al. 2nd Generation Intel® Core Processor Family: Intel® Core i7, i5 and i3 , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[34] Philippe Olivier Alexandre Navaux,et al. kMAF: Automatic kernel-level management of thread and data affinity , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[35] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[36] Rob H. Bisseling,et al. Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[37] Takeshi Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA.
[38] Samuel Thibault,et al. Structuring the execution of OpenMP applications for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[39] Jeffrey K. Hollingsworth,et al. Hardware monitors for dynamic page migration , 2008, J. Parallel Distributed Comput..
[40] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[41] Carla Schlatter Ellis,et al. An analysis of dynamic page placement on a NUMA multiprocessor , 1992, SIGMETRICS '92/PERFORMANCE '92.
[42] Emmanuel Jeannot,et al. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures , 2010, Euro-Par.
[43] Michael Stumm,et al. Enhancing operating system support for multicore processors by using hardware performance monitoring , 2009, OPSR.
[44] José Duato,et al. Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[45] Aamer Jaleel,et al. Analyzing Parallel Programs with PIN , 2010, Computer.