Abstracting Multi-Core Topologies with MCTOP
暂无分享,去创建一个
[1] Rachid Guerraoui,et al. Unlocking Energy , 2016, USENIX Annual Technical Conference.
[2] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[3] Marc Shapiro,et al. A study of the scalability of stop-the-world garbage collectors on multicores , 2013, ASPLOS '13.
[4] John M. Mellor-Crummey,et al. Contention-conscious, locality-preserving locks , 2016, PPoPP.
[5] Tudor David,et al. Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.
[6] Takeshi Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA 2009.
[7] Robert Morris,et al. Optimizing MapReduce for Multicore Architectures , 2010 .
[8] Keshav Pingali,et al. Automatic measurement of memory hierarchy parameters , 2005, SIGMETRICS '05.
[9] Michael Stumm,et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.
[10] Muthu Dayalan,et al. MapReduce : Simplified Data Processing on Large Cluster , 2018 .
[11] Manuel Prieto,et al. A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.
[12] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[13] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[14] Arthur Charguéraud,et al. Scheduling parallel programs by work stealing with private deques , 2013, PPoPP '13.
[15] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[16] Sabela Ramos,et al. Machine-Aware Atomic Broadcast Trees for Multicores , 2016, OSDI.
[17] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[18] Eitan Frachtenberg,et al. Power and performance evaluation of Memcached on the TILEPro64 architecture , 2012, Sustain. Comput. Informatics Syst..
[19] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[20] Michael Stumm,et al. Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.
[21] Wolfgang E. Nagel,et al. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.
[23] Norman May,et al. Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement , 2015, Proc. VLDB Endow..
[24] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.
[25] Changwoo Min,et al. Scalability in the Clouds!: A Myth or Reality? , 2015, APSys.
[26] Michael Garland,et al. Architecture-Adaptive Code Variant Tuning , 2016, ASPLOS.
[27] Vivien Quéma,et al. Multicore Locks: The Case Is Not Closed Yet , 2016, USENIX Annual Technical Conference.
[28] Kunle Olukotun,et al. Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.
[29] Vivien Quéma,et al. Thread and Memory Placement on NUMA Systems: Asymmetry Matters , 2015, USENIX Annual Technical Conference.
[30] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[31] Thomas R. Gross,et al. A library for portable and composable data locality optimizations for NUMA systems , 2015, PPOPP.
[32] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[33] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[34] Timothy L. Harris,et al. Callisto-RTS: Fine-Grain Parallel Loops , 2015, USENIX Annual Technical Conference.
[35] Timothy Roscoe,et al. Arrakis , 2014, OSDI.
[36] Dheeraj Reddy,et al. Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.
[37] Gustavo Alonso,et al. Deployment of Query Plans on Multicores , 2014, Proc. VLDB Endow..
[38] Ippokratis Pandis,et al. OLTP on Hardware Islands , 2012, Proc. VLDB Endow..
[39] Peter Sanders,et al. MCSTL: the multi-core standard template library , 2007, PPOPP.
[40] Frank Bellosa,et al. Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.
[41] David J. Brown,et al. Toward energy-efficient computing , 2010, CACM.
[42] Nir Shavit,et al. NUMA-aware reader-writer locks , 2013, PPoPP '13.
[43] Kevin M. Lepak,et al. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.
[44] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[45] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..
[46] Anant Agarwal,et al. Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.
[47] Tong Li,et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[48] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.
[49] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.
[50] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[51] Adrian Schüpbach,et al. Your computer is already a distributed system. Why isn't your OS? , 2009, HotOS.
[52] Robert Tappan Morris,et al. An Analysis of Linux Scalability to Many Cores , 2010, OSDI.
[53] Timothy Roscoe,et al. Decoupling Cores, Kernels, and Operating Systems , 2014, OSDI.
[54] Nhan Nguyen,et al. NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines , 2015, ASPLOS.
[55] Takeshi Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA.
[56] Pradeep Dubey,et al. Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..
[57] Adrian Schüpbach,et al. Embracing diversity in the Barrelfish manycore operating system , 2008 .
[58] A. Agarwal,et al. Adaptive backoff synchronization techniques , 1989, ISCA '89.
[59] Christoforos E. Kozyrakis,et al. IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.
[60] John M. Mellor-Crummey,et al. High performance locks for multi-level NUMA systems , 2015, PPoPP.
[61] Yang Zhang,et al. Corey: An Operating System for Many Cores , 2008, OSDI.
[62] Eddie Kohler,et al. Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.
[63] Babak Falsafi,et al. Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.
[64] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[65] Hiroshi Inoue,et al. SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures , 2015, Proc. VLDB Endow..