Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads
暂无分享,去创建一个
Hari Balakrishnan | Joshua Fried | Adam Belay | Jonathan Behrens | Amy Ousterhout | H. Balakrishnan | J. Behrens | Joshua Fried | A. Belay | Amy Ousterhout
[1] David G. Andersen,et al. Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.
[2] Adam Wierman,et al. Open Versus Closed: A Cautionary Tale , 2006, NSDI.
[3] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[4] Amin Vahdat,et al. Chronos: predictable low latency for data center applications , 2012, SoCC '12.
[5] Ashish Gupta,et al. The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..
[6] Amin Vahdat,et al. Carousel: Scalable Traffic Shaping at End Hosts , 2017, SIGCOMM.
[7] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[8] Benjamin Hindman,et al. Composing parallel software efficiently with lithe , 2010, PLDI '10.
[9] Purificacion Matute,et al. Transmission control protocol: darpa internet program protocol specification , 1981 .
[10] Christoforos E. Kozyrakis,et al. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.
[11] Thomas F. Wenisch,et al. Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[12] David E. Culler,et al. SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.
[13] Lingjia Tang,et al. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[14] Sylvia Ratnasamy,et al. SoftNIC: A Software NIC to Augment Hardware , 2015 .
[15] Eunyoung Jeong,et al. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.
[16] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[17] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] Robert Grimm,et al. Application performance and flexibility on exokernel systems , 1997, SOSP.
[19] Edouard Bugnion,et al. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.
[20] Sameh Elnikety,et al. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.
[21] Miguel Castro,et al. FaRM: Fast Remote Memory , 2014, NSDI.
[22] Tony Tung,et al. Scaling Memcache at Facebook , 2013, NSDI.
[23] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[24] George C. Necula,et al. Capriccio: scalable threads for internet services , 2003, SOSP '03.
[25] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[26] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[27] Mark Handley,et al. Network stack specialization for performance , 2015, SIGCOMM 2015.
[28] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[29] Xi Yang,et al. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading , 2016, USENIX Annual Technical Conference.
[30] Michael Kaminsky,et al. Datacenter RPCs can be General and Fast , 2018, NSDI.
[31] Brighten Godfrey,et al. DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.
[32] Christoforos E. Kozyrakis,et al. Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.
[33] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.
[34] Paul E. McKenney,et al. RCU Usage In the Linux Kernel : One Decade Later , 2012 .
[35] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .
[36] Kushagra Vaid,et al. Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.
[37] Xiao Zhang,et al. CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.
[38] Virendra J. Marathe,et al. Callisto: co-scheduling parallel runtime systems , 2014, EuroSys '14.
[39] Vimalkumar Jeyakumar,et al. Juggler: a practical reordering resilient network stack for datacenters , 2016, EuroSys.
[40] Keqiang He,et al. Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.
[41] Kevin Klues,et al. Tessellation: space-time partitioning in a manycore client OS , 2009 .
[42] Scott Shenker,et al. Network Requirements for Resource Disaggregation , 2016, OSDI.
[43] Luigi Rizzo,et al. netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.
[44] Corporate Unix Press. System V application binary interface (3rd ed.) , 1993 .
[45] Chenyang Lu,et al. Work stealing for interactive services to meet target latency , 2016, PPoPP.
[46] John Kubiatowicz,et al. Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[47] Nan Hua,et al. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization , 2018, NSDI.
[48] Timothy Roscoe,et al. Arrakis , 2014, OSDI.
[49] Brian N. Bershad,et al. Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.
[50] David A. Maltz,et al. Data center TCP (DCTCP) , 2010, SIGCOMM 2010.
[51] Song Jiang,et al. Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.
[52] Christoforos E. Kozyrakis,et al. Energy proportionality and workload consolidation for latency-critical applications , 2015, SoCC.
[53] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[54] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.
[55] Thomas E. Anderson,et al. Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .
[56] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.
[57] Kevin Klues,et al. Improving per-node efficiency in the datacenter with new OS abstractions , 2011, SoCC.
[58] Anoop Gupta,et al. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.
[59] Amer Diwan,et al. Performance Analysis of Cloud Applications , 2018, NSDI.
[60] Mendel Rosenblum,et al. It's Time for Low Latency , 2011, HotOS.
[61] Jonathan Adams,et al. Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources , 2001, USENIX Annual Technical Conference, General Track.
[62] Qian Li,et al. Arachne: Core-Aware Thread Management , 2018, OSDI.
[63] Katerina J. Argyraki,et al. ResQ: Enabling SLOs in Network Function Virtualization , 2018, NSDI.
[64] Donald E. Porter,et al. Rethinking the library OS from the top down , 2011, ASPLOS XVI.