Dynamic concurrency throttling on NUMA systems and data migration impacts
暂无分享,去创建一个
Guilherme Korol | Antonio Carlos Schneider Beck | Mateus Beck Rutzig | Michael Guilherme Jordan | Janaina Schwarzrock | Arthur Francisco Lorenzon | Charles Cardoso de Oliveira
[1] Philippe Olivier Alexandre Navaux,et al. Potential Gains in EDP by Dynamically Adapting the Number of Threads for OpenMP Applications in Embedded Systems , 2017, 2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC).
[2] Samuel Thibault,et al. Structuring the execution of OpenMP applications for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Dimitrios S. Nikolopoulos,et al. Application-Level Energy Awareness for OpenMP , 2015, IWOMP.
[4] Peter Arbenz,et al. Introduction to Parallel Computing (Oxford Texts in Applied and Engineering Mathematics) , 2004 .
[5] Vivien Quéma,et al. Thread and Memory Placement on NUMA Systems: Asymmetry Matters , 2015, USENIX Annual Technical Conference.
[6] Israel Koren,et al. Affinity-Based Thread and Data Mapping in Shared Memory Systems , 2016, ACM Comput. Surv..
[7] S SohiGurindar,et al. Adaptive, efficient, parallel execution of parallel programs , 2014 .
[8] Philippe Olivier Alexandre Navaux,et al. Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[9] Dimitrios S. Nikolopoulos,et al. Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.
[10] Antonio Carlos Schneider Beck,et al. LAANT: A library to automatically optimize EDP for OpenMP applications , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[11] Hermann Härtig,et al. Measuring energy consumption for short code paths using RAPL , 2012, PERV.
[12] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[13] Philippe Olivier Alexandre Navaux,et al. kMAF: Automatic kernel-level management of thread and data affinity , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[14] Wei Wang,et al. Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[15] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[16] Feedback-driven threading , 2008 .
[17] Michael J. Quinn,et al. Parallel programming in C with MPI and OpenMP , 2003 .
[18] Brice Goglin,et al. ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures , 2010, International Journal of Parallel Programming.
[19] Antonio Carlos Schneider Beck,et al. Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy , 2016, J. Parallel Distributed Comput..
[20] Steven K. Reinhardt,et al. The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[21] Daniele De Sensi. Predicting Performance and Power Consumption of Parallel Applications , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[22] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.
[23] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[24] Scott A. Mahlke,et al. When less is more (LIMO):controlled parallelism forimproved efficiency , 2012, CASES '12.
[25] Antonio Carlos Schneider Beck,et al. Aurora: Seamless Optimization of OpenMP Applications , 2019, IEEE Transactions on Parallel and Distributed Systems.
[26] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[27] Stephen L. Olivier,et al. Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[28] Laxmi N. Bhuyan,et al. Thread reinforcer: Dynamically determining number of threads via OS level monitoring , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[29] Philippe Olivier Alexandre Navaux,et al. Characterizing communication and page usage of parallel applications for thread and data mapping , 2015, Perform. Evaluation.
[30] Dimitrios S. Nikolopoulos,et al. Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.
[31] Nathan Clark,et al. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.
[32] Antonio Carlos Schneider Beck,et al. Optimized Use of Parallel Programming Interfaces in Multithreaded Embedded Architectures , 2015, 2015 IEEE Computer Society Annual Symposium on VLSI.
[33] Marco Danelutto,et al. A Reconfiguration Algorithm for Power-Aware Parallel Applications , 2016, ACM Trans. Archit. Code Optim..
[34] Barbara M. Chapman,et al. ARCS: Adaptive Runtime Configuration Selection for Power-Constrained OpenMP Applications , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[35] Antonio Carlos Schneider Beck,et al. Parallel Computing Hits the Power Wall - Principles, Challenges, and a Survey of Solutions , 2019, Springer Briefs in Computer Science.
[36] Onur Mutlu,et al. MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[37] Luigi Carro,et al. Adaptable Embedded Systems , 2012 .
[38] Yale N. Patt,et al. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.
[39] W. P. Petersen,et al. Introduction to Parallel Computing , 2004 .
[40] Jaejin Lee,et al. Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.