CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services

We present CosTLO, a system that reduces the high latency variance associated with cloud storage services by augmenting GET/PUT requests issued by end-hosts with redundant requests, so that the earliest response can be considered. To reduce the cost overhead imposed by redundancy, unlike prior efforts that have used this approach, CosTLO combines the use of multiple forms of redundancy. Since this results in a large number of configurations in which CosTLO can issue redundant requests, we conduct a comprehensive measurement study on S3 and Azure to identify the configurations that are viable in practice. Informed by this study, we design CosTLO to satisfy any application's goals for latency variance by 1) estimating the latency variance offered by any particular configuration, 2) efficiently searching through the configuration space to select a cost-effective configuration among the ones that can offer the desired latency variance, and 3) preserving data consistency despite CosTLO's use of redundant requests. We show that, for the median PlanetLab node, CosTLO can halve the latency variance associated with fetching content from Amazon S3, with only a 25% increase in cost.

[1]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[2]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[3]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[4]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[5]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[6]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[7]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[8]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[9]  A. Rowstron,et al.  Towards predictable datacenter networks , 2011, SIGCOMM.

[10]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[11]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[12]  Arun Venkataramani,et al.  iPlane: an information plane for distributed services , 2006, OSDI '06.

[13]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[14]  Pramod Bhatotia,et al.  Orchestrating the Deployment of Computations in the Cloud with Conductor , 2012, NSDI.

[15]  Keqiang He,et al.  Next stop, the cloud: understanding modern web service deployment in EC2 and azure , 2013, Internet Measurement Conference.

[16]  Hovav Shacham,et al.  Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds , 2009, CCS.

[17]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[18]  I. Stoica,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[19]  Ramakrishna Kotla,et al.  SafeStore: A Durable and Practical Storage System , 2007, USENIX Annual Technical Conference.

[20]  Dinan Gunawardena,et al.  Chatty Tenants and the Cloud Network Sharing Problem , 2013, NSDI.

[21]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[22]  Michael I. Jordan,et al.  Characterizing, modeling, and generating workload spikes for stateful services , 2010, SoCC '10.

[23]  Antony I. T. Rowstron,et al.  IOFlow: a software-defined storage architecture , 2013, SOSP.

[24]  Hari Balakrishnan,et al.  Improving web availability for clients with MONET , 2005, NSDI.

[25]  David Wetherall,et al.  Studying Black Holes in the Internet with Hubble , 2008, NSDI.

[26]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[27]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[28]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[29]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[30]  Brice Augustin,et al.  Avoiding traceroute anomalies with Paris traceroute , 2006, IMC '06.