CLAP: Component-Level Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services

Modern latency-critical online services such as search engines often process requests by consulting large input data spanning massive parallel components. Hence the tail latency of these components determines the service latency. To trade off result accuracy for tail latency reduction, existing techniques use the components responding before a specified deadline to produce approximate results. However, they skip a large proportion of components when load gets heavier, thus incurring large accuracy losses. In this paper, we propose CLAP to enable component-level approximate processing of requests for low tail latency and small accuracy losses. CLAP aggregates information of input data to create small aggregated data points. Using these points, CLAP reduces latency variance of parallel components and allows them to produce initial results quickly; CLAP also identifies the parts of input data most related to requests’ result accuracies, thus first using these parts to improve the produced results to minimize accuracy losses. We evaluated CLAP using real services and datasets. The results show: (i) CLAP reduces tail latency by 6.46 times with accuracy losses of 2.2 percent compared to existing exact processing techniques; (ii) when using the same latency, CLAP reduces accuracy losses by 31.58 times compared to existing approximate processing techniques.

[1]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  David A. Bader,et al.  Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[3]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[4]  Peter G. Harrison,et al.  Beyond the mean in fork-join queues: Efficient approximation for response-time tails , 2015, Perform. Evaluation.

[5]  Sameh Elnikety,et al.  Tians Scheduling: Using Partial Processing in Best-Effort Applications , 2011, 2011 31st International Conference on Distributed Computing Systems.

[6]  Zhe Wu,et al.  CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.

[7]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[8]  Qing Yang,et al.  BigStation: enabling scalable real-time signal processingin large mu-mimo systems , 2013, SIGCOMM.

[9]  Christoforos E. Kozyrakis,et al.  Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.

[10]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[11]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[12]  K. Pearson NOTES ON THE HISTORY OF CORRELATION , 1920 .

[13]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[14]  Andrea C. Arpaci-Dusseau,et al.  Reducing File System Tail Latencies with Chopper , 2015, FAST.

[15]  Carlo Zaniolo,et al.  Early Accurate Results for Advanced Analytics on MapReduce , 2012, Proc. VLDB Endow..

[16]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[17]  Zibin Zheng,et al.  DR2: Dynamic Request Routing for Tolerating Latency Variability in Online Cloud Applications , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[18]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[19]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[20]  Gu-Yeon Wei,et al.  Tradeoffs between power management and tail latency in warehouse-scale applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Seung-won Hwang,et al.  Predictive parallelization: taming tail latencies in web search , 2014, SIGIR.

[22]  Gregory W. Wornell,et al.  Efficient task replication for fast response times in parallel computation , 2014, SIGMETRICS '14.

[23]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[24]  T. N. Vijaykumar,et al.  TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[26]  Genevieve Gorrell,et al.  Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.

[27]  Randy H. Katz,et al.  Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.

[28]  Chris Jermaine,et al.  Online aggregation for large MapReduce jobs , 2011, Proc. VLDB Endow..

[29]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[30]  G HarrisonPeter,et al.  Beyond the mean in fork-join queues , 2015 .

[31]  Ameet Talwalkar,et al.  Knowing when you're wrong: building fast and reliable approximate query processing systems , 2014, SIGMOD Conference.

[32]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[33]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Shaolei Ren,et al.  Optimal Aggregation Policy for Reducing Tail Latency of Web Search , 2015, SIGIR.

[35]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[36]  Calton Pu,et al.  Detecting Transient Bottlenecks in n-Tier Applications through Fine-Grained Analysis , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[37]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[38]  Hwanju Kim,et al.  TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services , 2016, ASPLOS.

[39]  Logan Kugler Is "good enough" computing good enough? , 2015, Commun. ACM.

[40]  James R. Larus,et al.  Zeta: scheduling interactive services with partial execution , 2012, SoCC '12.

[41]  Mor Harchol-Balter,et al.  PriorityMeister: Tail Latency QoS for Shared Networked Storage , 2014, SoCC.

[42]  Rui Han,et al.  AccuracyTrader: Accuracy-Aware Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[43]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[44]  Jianfeng Zhan,et al.  PCS: Predictive Component-Level Scheduling for Reducing Tail Latency in Cloud Online Services , 2015, 2015 44th International Conference on Parallel Processing.

[45]  Ricardo Bianchini,et al.  Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[46]  Minos N. Garofalakis,et al.  Approximate Query Processing: Taming the TeraBytes , 2001, VLDB.

[47]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..