Reducing tail latency using duplication: a multi-layered approach
暂无分享,去创建一个
Ihsan Ayyub Qazi | Muhammad Asim Jamshed | Fahad R. Dogar | Hafiz Mohsin Bashir | Abdullah Bin Faisal | Peter Vondras | Ali Musa Iftikhar | I. Qazi | M. Jamshed | P. Vondras | Ali Musa Iftikhar
[1] Tanakorn Leesatapornwongsa,et al. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.
[2] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.
[3] Alex X. Liu,et al. Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks , 2014 .
[4] Dan Feng,et al. CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.
[5] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[6] Zhe Wu,et al. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.
[7] T. N. Vijaykumar,et al. Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.
[8] Anja Feldmann,et al. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.
[9] Hari Balakrishnan,et al. Restructuring endpoint congestion control , 2018, SIGCOMM.
[10] Amer Diwan,et al. Performance Analysis of Cloud Applications , 2018, NSDI.
[11] Christopher Stewart,et al. Zoolander: Efficiently Meeting Very Strict, Low-Latency SLOs , 2013, ICAC.
[12] Bo Fu,et al. PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks , 2017, SoCC.
[13] Irfan Ahmad,et al. PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.
[14] Nick McKeown,et al. pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.
[15] Adam Wierman,et al. How to Determine a Good Multi-Programming Level for External Scheduling , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[16] Luigi Rizzo,et al. netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.
[17] Peter Steenkiste,et al. Architecting for edge diversity: supporting rich services over an unbundled transport , 2012, CoNEXT '12.
[18] Andrea C. Arpaci-Dusseau,et al. Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.
[19] Adam Wierman,et al. This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .
[20] Robert B. Ross,et al. Fail-Slow at Scale , 2018, ACM Trans. Storage.
[21] Albert G. Greenberg,et al. VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.
[22] David E. Culler,et al. SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.
[23] Brighten Godfrey,et al. Low latency via redundancy , 2013, CoNEXT.
[24] Dan Pei,et al. Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers , 2016, USENIX Annual Technical Conference.
[25] Libin Liu,et al. RepNet: Cutting Latency with Flow Replication in Data Center Networks , 2018, IEEE Transactions on Services Computing.
[26] Mor Harchol-Balter,et al. Scheduling for efficiency and fairness in systems with redundancy , 2017, Perform. Evaluation.
[27] John W. Byers,et al. Judicious QoS using cloud overlays , 2019, CoNEXT.
[28] Adam Wierman,et al. Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.
[29] Srikanth Kandula,et al. Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.
[30] Albert G. Greenberg,et al. Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.
[31] Bianca Schroeder,et al. sRoute: Treating the Storage Stack Like a Network , 2016, FAST.
[32] Antony I. T. Rowstron,et al. Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.
[33] Haitao Wu,et al. Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.
[34] Albert G. Greenberg,et al. Data center TCP (DCTCP) , 2010, SIGCOMM '10.
[35] Srinivasan Seshan,et al. XIA: Efficient Support for Evolvable Internetworking , 2012, NSDI.
[36] Rodrigo Fonseca,et al. Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.
[37] Brian D. Noble,et al. Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.
[38] Patrick Wendell,et al. Sparrow: distributed, low latency scheduling , 2013, SOSP.
[39] Hitesh Ballani,et al. End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.
[40] Michael I. Jordan,et al. The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.
[41] Rodrigo Fonseca,et al. Pivot tracing , 2018, USENIX ATC.
[42] Zartash Afzal Uzmi,et al. Workload adaptive flow scheduling , 2018, CoNEXT.
[43] Ning Zhang,et al. ERMS: An Elastic Replication Management System for HDFS , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.
[44] Fahad R. Dogar,et al. Leveraging the Power of Cloud for Reliable Wide Area Communication , 2015, HotNets.
[45] Carlos Maltzahn,et al. Malacology: A Programmable Storage System , 2017, EuroSys.
[46] Baochun Li,et al. RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.
[47] Antony I. T. Rowstron,et al. IOFlow: a software-defined storage architecture , 2013, SOSP.
[48] Fahad R. Dogar,et al. Measuring and Improving the Reliability of Wide-Area Cloud Paths , 2017, WWW.
[49] Rodrigo Fonseca,et al. Principled workflow-centric tracing of distributed systems , 2016, SoCC.
[50] Ihsan Ayyub Qazi,et al. Towards a Redundancy-Aware Network Stack for Data Centers , 2016, HotNets.
[51] Donald Beaver,et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .
[52] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.
[53] Cristina L. Abad,et al. DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.
[54] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[55] Haryadi S. Gunawi,et al. Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.
[56] Yongqiang Xiong,et al. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware , 2016, SIGCOMM.
[57] Wei Bai,et al. Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.
[58] Ihsan Ayyub Qazi,et al. Load balancing over symmetric virtual topologies , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[59] Kannan Ramchandran,et al. A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .
[60] Eunyoung Jeong,et al. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.
[61] Andrew A. Chien,et al. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface , 2017, SOSP.
[62] George Parisis,et al. Trevi: watering down storage hotspots with cool fountain codes , 2013, HotNets.
[63] George Varghese,et al. P4: programming protocol-independent packet processors , 2013, CCRV.