Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints
暂无分享,去创建一个
Ricardo Bianchini | Willy Zwaenepoel | Alvin R. Lebeck | Iñigo Goiri | María F. Borge | Pulkit A. Misra | W. Zwaenepoel | R. Bianchini | A. Lebeck | Íñigo Goiri
[1] Ju Wang,et al. Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.
[2] D. Andersen,et al. A Fast Array of Wimpy Nodes , 2008 .
[3] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.
[4] Michael I. Jordan,et al. The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.
[5] Mor Harchol-Balter,et al. PriorityMeister: Tail Latency QoS for Shared Networked Storage , 2014, SoCC.
[6] Zhengping Qian,et al. Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters , 2017, EuroSys.
[7] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[8] T. S. Eugene Ng,et al. Understanding the effects and implications of compute node related failures in hadoop , 2012, HPDC '12.
[9] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[10] Marco Canini,et al. Rein: Taming Tail Latency in Key-Value Stores via Multiget Scheduling , 2017, EuroSys.
[11] Amar Phanishayee,et al. FAWN: a fast array of wimpy nodes , 2009, SOSP '09.
[12] Riyaz Jamadar,et al. Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2016 .
[13] GhemawatSanjay,et al. The Google file system , 2003 .
[14] Ricardo Bianchini,et al. History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters , 2016, OSDI.
[15] Anees Shaikh,et al. Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.
[16] Andrea C. Arpaci-Dusseau,et al. Reducing File System Tail Latencies with Chopper , 2015, FAST.
[17] Andrew A. Chien,et al. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments , 2016, FAST.
[18] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[19] Adam Wierman,et al. Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.
[20] Rodrigo Fonseca,et al. Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.
[21] Jie Xu,et al. Adaptive Speculation for Efficient Internetware Application Execution in Clouds , 2018, ACM Trans. Internet Techn..
[22] Emin Gün Sirer,et al. HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.
[23] Anja Feldmann,et al. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.
[24] Sameh Elnikety,et al. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.
[25] Srikanth Kandula,et al. Speeding up distributed request-response workflows , 2013, SIGCOMM.
[26] Wei Jin,et al. Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.
[27] Ricardo Bianchini,et al. Scaling Distributed File Systems in Resource-Harvesting Datacenters , 2017, USENIX Annual Technical Conference.
[28] Michael J. Freedman,et al. Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.
[29] Zhen Cao,et al. On the Performance Variation in Modern Storage Stacks , 2017, FAST.
[30] Dahlia Malkhi,et al. CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.
[31] Anand Sivasubramaniam,et al. Storage performance virtualization via throughput and latency control , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.
[32] Andrea C. Arpaci-Dusseau,et al. Split-level I/O scheduling , 2015, SOSP.
[33] Eben Hewitt. Cassandra - The Definitive Guide: Distributed Data at Web Scale , 2011 .
[34] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[35] Michael Isard,et al. Autopilot: automatic data center management , 2007, OPSR.
[36] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.
[37] Wonho Kim,et al. Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services , 2016, OSDI.
[38] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..
[39] Randy H. Katz,et al. Cake: enabling high-level SLOs on shared storage systems , 2012, SoCC '12.
[40] Yin Wang,et al. Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems , 2015, USENIX Annual Technical Conference.
[41] Scott Shenker,et al. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.
[42] Bu-Sung Lee,et al. DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2014, IEEE Transactions on Cloud Computing.
[43] Carlo Curino,et al. Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.
[44] Zhe Wu,et al. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.
[45] Bo Fu,et al. PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks , 2017, SoCC.
[46] Irfan Ahmad,et al. PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.
[47] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.
[48] Gregory R. Ganger,et al. On the diversity of cluster workloads and its impact on research results , 2018, USENIX Annual Technical Conference.
[49] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[50] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[51] Andrew A. Chien,et al. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface , 2017, SOSP.
[52] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.
[53] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[54] Ethan Katz-Bassett,et al. SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.
[55] Robbert van Renesse,et al. Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.