论文信息 - μ Suite: A Benchmark Suite for Microservices

μ Suite: A Benchmark Suite for Microservices

Modern On-Line Data Intensive (OLDI) applications have evolved from monolithic systems to instead comprise numerous, distributed microservices interacting via Remote Procedure Calls (RPCs). Microservices face single-digit millisecond RPC latency goals (implying sub-ms medians)—much tighter than their monolithic ancestors that must meet $\ge 100$ ms latency targets. Sub-ms-scale OS/network overheads that were once insignificant for such monoliths can now come to dominate in the sub-ms-scale microservice regime. It is therefore vital to characterize the influence of OS- and network-based effects on microservices. Unfortunately, widely-used academic data center benchmark suites are unsuitable to aid this characterization as they (1) use monolithic rather than microservice architectures, and (2) largely have request service times $\ge 100$ ms. In this paper, we investigate how OS and network overheads impact microservice median and tail latency by developing a complete suite of microservices called $ \mu$ Suite that we use to facilitate our study. $ \mu$ Suite comprises four OLDI services composed of microservices: image similarity search, protocol routing for key-value stores, set algebra on posting lists for document search, and recommender systems. Our characterization reveals that the relationship between optimal OS/network parameters and service load is complex. Our primary finding is that non-optimal OS scheduler decisions can degrade microservice tail latency by up to $\tilde 87$%.

Thomas F. Wenisch | Akshitha Sriraman | T. Wenisch | Akshitha Sriraman

[1] Tony Tung,et al. Scaling Memcache at Facebook , 2013, NSDI.

[2] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[3] Thomas F. Wenisch,et al. Deconstructing the Tail at Scale Effect Across Network Protocols , 2017, ArXiv.

[4] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[6] William B. March,et al. MLPACK: a scalable C++ machine learning library , 2012, J. Mach. Learn. Res..

[7] Rubby Casallas,et al. Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud , 2015, 2015 10th Computing Colombian Conference (10CCC).

[8] Masatoshi Yoshikawa,et al. The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[9] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[10] Ronald G. Dreslinski,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[11] Mattan Erez,et al. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[12] T. N. Vijaykumar,et al. TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[14] Mayank Bawa,et al. LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[15] Panos Kalnis,et al. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space , 2010, TODS.

[16] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17] Chin-Wan Chung,et al. The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[18] Seung-won Hwang,et al. Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search , 2015, WSDM.

[19] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.

[20] Eric N. Herness,et al. WebSphere Application Server: A foundation for on demand computing , 2004, IBM Syst. J..

[21] Luis Ceze,et al. NCAM: Near-Data Processing for Nearest Neighbor Search , 2015, MEMSYS.

[22] Patrick Seemann,et al. Matrix Factorization Techniques for Recommender Systems , 2014 .

[23] David A. Patterson,et al. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[24] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.

[25] Dmitry Namiot,et al. On micro-services architecture , 2014 .

[26] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[27] Xiao Zhang,et al. CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[28] Hans-Peter Kriegel,et al. Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[29] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[30] Panos Kalnis,et al. Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[31] Xiaola Lin,et al. Analysis of optimal thread pool size , 2000, OPSR.

[32] Christoforos E. Kozyrakis,et al. Memory Hierarchy for Web Search , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33] Michael A. Casey,et al. Locality-Sensitive Hashing for Finding Nearest Neighbors , 2008 .

[34] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[35] Ricardo Bianchini,et al. Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[36] Rafail Ostrovsky,et al. Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[37] William Pugh,et al. Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[38] Yuqing Zhu,et al. BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[39] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[40] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[41] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[42] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[44] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.

[45] James McNames,et al. A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[46] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[47] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[48] Thomas F. Wenisch,et al. Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[49] Ilan Shimshoni,et al. Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50] Ron Kohavi,et al. Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[51] Edward Y. Chang,et al. Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[52] David E. Culler,et al. SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[53] Allan Kuchinsky,et al. Quality is in the eye of the beholder: meeting users' requirements for Internet quality of service , 2000, CHI.

[54] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[55] Yu Liu,et al. K-Means Clustering with Distributed Dimensions , 2016, ICML.

[56] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[57] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[58] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[59] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[60] T. N. Vijaykumar,et al. Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[61] Carla E. Brodley,et al. Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[62] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[63] Christian Böhm,et al. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[64] Christian Böhm,et al. Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[65] Shiliang Hu,et al. LASER: Light, Accurate Sharing dEtection and Repair , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[66] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[67] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[68] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Tanaka Yuzuru,et al. Spherical LSH for Approximate Nearest Neighbor Search on Unit Hypersphere , 2007 .

[70] Berkant Barla Cambazoglu,et al. Impact of response latency on user behavior in web search , 2014, SIGIR.

[71] Maria Kihl,et al. Web server performance modeling using an M/G/1/K*PS queue , 2003, 10th International Conference on Telecommunications, 2003. ICT 2003..

[72] Christoforos E. Kozyrakis,et al. Energy proportionality and workload consolidation for latency-critical applications , 2015, SoCC.

[73] Brad Fitzpatrick,et al. Distributed caching with memcached , 2004 .

[74] Zhe Wang,et al. Modeling LSH for performance tuning , 2008, CIKM '08.

[75] Lingjia Tang,et al. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[76] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[77] David G. Lowe,et al. Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78] Hui Ding,et al. TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[79] Brahim Medjahed,et al. A Query Rewriting Approach for Web Service Composition , 2010, IEEE Transactions on Services Computing.

[80] Dan Tsafrir,et al. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) , 2007, ExpCS '07.

[81] Mike Amundsen,et al. Microservice Architecture: Aligning Principles, Practices, and Culture , 2016 .

[82] Ronald G. Dreslinski,et al. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[83] Gu-Yeon Wei,et al. Tradeoffs between power management and tail latency in warehouse-scale applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[84] Thomas F. Wenisch,et al. µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[85] Andrew W. Moore,et al. An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.