μ Suite: A Benchmark Suite for Microservices

Modern On-Line Data Intensive (OLDI) applications have evolved from monolithic systems to instead comprise numerous, distributed microservices interacting via Remote Procedure Calls (RPCs). Microservices face single-digit millisecond RPC latency goals (implying sub-ms medians)—much tighter than their monolithic ancestors that must meet $\ge 100$ ms latency targets. Sub-ms-scale OS/network overheads that were once insignificant for such monoliths can now come to dominate in the sub-ms-scale microservice regime. It is therefore vital to characterize the influence of OS- and network-based effects on microservices. Unfortunately, widely-used academic data center benchmark suites are unsuitable to aid this characterization as they (1) use monolithic rather than microservice architectures, and (2) largely have request service times $\ge 100$ ms. In this paper, we investigate how OS and network overheads impact microservice median and tail latency by developing a complete suite of microservices called $ \mu$ Suite that we use to facilitate our study. $ \mu$ Suite comprises four OLDI services composed of microservices: image similarity search, protocol routing for key-value stores, set algebra on posting lists for document search, and recommender systems. Our characterization reveals that the relationship between optimal OS/network parameters and service load is complex. Our primary finding is that non-optimal OS scheduler decisions can degrade microservice tail latency by up to $\tilde 87$%.

[1]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[2]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[3]  Thomas F. Wenisch,et al.  Deconstructing the Tail at Scale Effect Across Network Protocols , 2017, ArXiv.

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[6]  William B. March,et al.  MLPACK: a scalable C++ machine learning library , 2012, J. Mach. Learn. Res..

[7]  Rubby Casallas,et al.  Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud , 2015, 2015 10th Computing Colombian Conference (10CCC).

[8]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[9]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[10]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[11]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[12]  T. N. Vijaykumar,et al.  TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[14]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[15]  Panos Kalnis,et al.  Efficient and accurate nearest neighbor and closest pair search in high-dimensional space , 2010, TODS.

[16]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[18]  Seung-won Hwang,et al.  Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search , 2015, WSDM.

[19]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[20]  Eric N. Herness,et al.  WebSphere Application Server: A foundation for on demand computing , 2004, IBM Syst. J..

[21]  Luis Ceze,et al.  NCAM: Near-Data Processing for Nearest Neighbor Search , 2015, MEMSYS.

[22]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[23]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[24]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[25]  Dmitry Namiot,et al.  On micro-services architecture , 2014 .

[26]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[27]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[28]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[29]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[30]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[31]  Xiaola Lin,et al.  Analysis of optimal thread pool size , 2000, OPSR.

[32]  Christoforos E. Kozyrakis,et al.  Memory Hierarchy for Web Search , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33]  Michael A. Casey,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors , 2008 .

[34]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[35]  Ricardo Bianchini,et al.  Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[36]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[37]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[38]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[39]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[40]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[41]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[42]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[44]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[45]  James McNames,et al.  A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[47]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[48]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[49]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[51]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[52]  David E. Culler,et al.  SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[53]  Allan Kuchinsky,et al.  Quality is in the eye of the beholder: meeting users' requirements for Internet quality of service , 2000, CHI.

[54]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[55]  Yu Liu,et al.  K-Means Clustering with Distributed Dimensions , 2016, ICML.

[56]  Daniel Sánchez,et al.  Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[57]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[58]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[59]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[60]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[61]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[62]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[63]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[64]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[65]  Shiliang Hu,et al.  LASER: Light, Accurate Sharing dEtection and Repair , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[66]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[67]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[68]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Tanaka Yuzuru,et al.  Spherical LSH for Approximate Nearest Neighbor Search on Unit Hypersphere , 2007 .

[70]  Berkant Barla Cambazoglu,et al.  Impact of response latency on user behavior in web search , 2014, SIGIR.

[71]  Maria Kihl,et al.  Web server performance modeling using an M/G/1/K*PS queue , 2003, 10th International Conference on Telecommunications, 2003. ICT 2003..

[72]  Christoforos E. Kozyrakis,et al.  Energy proportionality and workload consolidation for latency-critical applications , 2015, SoCC.

[73]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[74]  Zhe Wang,et al.  Modeling LSH for performance tuning , 2008, CIKM '08.

[75]  Lingjia Tang,et al.  Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[76]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[77]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[79]  Brahim Medjahed,et al.  A Query Rewriting Approach for Web Service Composition , 2010, IEEE Transactions on Services Computing.

[80]  Dan Tsafrir,et al.  The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) , 2007, ExpCS '07.

[81]  Mike Amundsen,et al.  Microservice Architecture: Aligning Principles, Practices, and Culture , 2016 .

[82]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[83]  Gu-Yeon Wei,et al.  Tradeoffs between power management and tail latency in warehouse-scale applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[84]  Thomas F. Wenisch,et al.  µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[85]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.