Maximizing system utilization via parallelism management for co-located parallel applications
暂无分享,去创建一个
Younghyun Cho | Bernhard Egger | Camilo A. Celis Guzman | Bernhard Egger | Younghyun Cho | Camilo A. Celis Guzman
[1] Bhyrav Mutnury,et al. QuickPath Interconnect (QPI) design and analysis in high speed servers , 2010, 19th Topical Meeting on Electrical Performance of Electronic Packaging and Systems.
[2] Sandhya Dwarkadas,et al. Data Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems , 2015, USENIX Annual Technical Conference.
[3] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[4] Henk Jonkers,et al. Queueing Models of Parallel Applications: The Glamis Methodology , 1994, Computer Performance Evaluation.
[5] Hiroshi Sasaki,et al. Coordinated power-performance optimization in manycores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[6] Ayal Zaks,et al. Parcae: a system for flexible parallel execution , 2012, PLDI.
[7] Lieven Eeckhout,et al. Undersubscribed threading on clustered cache architectures , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[8] Michael F. P. O'Boyle,et al. Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.
[9] Younghyun Cho,et al. Adaptive Space-Shared Scheduling for Shared-Memory Parallel Programs , 2015, JSSPP.
[10] Brian D. Bunday,et al. Basic queueing theory , 1986 .
[11] Yong Meng Teo,et al. Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems , 2011, 2011 International Conference on Parallel Processing.
[12] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[13] Antonello Monti,et al. Dynamic Co-Scheduling Driven by Main Memory Bandwidth Utilization , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[14] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[15] Josef Weidendorfer,et al. Case Study on Co-scheduling for HPC Applications , 2015, 2015 44th International Conference on Parallel Processing Workshops.
[16] Thomas R. Gross,et al. Matching memory access patterns and data placement for NUMA systems , 2012, CGO '12.
[17] Anant Agarwal,et al. An operating system for multicore and clouds: mechanisms and implementation , 2010, SoCC '10.
[18] Kevin Klues,et al. Tessellation: space-time partitioning in a manycore client OS , 2009 .
[19] Thomas R. Gross,et al. Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead , 2011, ISMM '11.
[20] Manoj Franklin,et al. Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..
[21] Vivien Quéma,et al. The Linux scheduler: a decade of wasted cores , 2016, EuroSys.
[22] Virendra J. Marathe,et al. Callisto: co-scheduling parallel runtime systems , 2014, EuroSys '14.
[23] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[24] Hiroshi Nakamura,et al. Scalability-based manycore partitioning , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Younghyun Cho,et al. Online scalability characterization of data-parallel programs on many cores , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[26] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[27] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[28] Nathan Clark,et al. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.
[29] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[30] Yong Meng Teo,et al. A Practical Approach for Performance Analysis of Shared-Memory Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[31] Bernhard Egger,et al. SnuMAP : an Open-Source Trace Profiler for Manycore Systems , 2017 .
[32] Michael F. P. O'Boyle,et al. A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.
[33] Arun Raman,et al. Parallelism orchestration using DoPE: the degree of parallelism executive , 2011, PLDI '11.
[34] Laxmi N. Bhuyan,et al. ADAPT: A framework for coscheduling multithreaded programs , 2013, TACO.
[35] Bruce R. Childers,et al. Using utility prediction models to dynamically choose program thread counts , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[36] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[37] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[38] Timothy Creech. Efficient multiprogramming for multicores with SCAF , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Timothy L. Harris,et al. Pandia: comprehensive contention-sensitive thread placement , 2017, EuroSys.
[40] Gurindar S. Sohi,et al. Adaptive, efficient, parallel execution of parallel programs , 2014, PLDI.