On the Performance of the Spotify Backend

We model and evaluate the performance of a distributed key-value storage system that is part of the Spotify backend. Spotify is an on-demand music streaming service, offering low-latency access to a library of over 20 million tracks and serving over 20 million users currently. We first present a simplified model of the Spotify storage architecture, in order to make its analysis feasible. We then introduce an analytical model for the distribution of the response time, a key metric in the Spotify service. We parameterize and validate the model using measurements from two different testbed configurations and from the operational Spotify infrastructure. We find that the model is accurate—measurements are within 11 % of predictions—within the range of normal load patterns. In addition, we model the capacity of the Spotify storage system under different object allocation policies and find that measurements on our testbed are within 9 % of the model predictions. The model helps us justify the object allocation policy adopted for Spotify storage system.

[1]  Yipeng Zhou,et al.  Division-of-labor between server and P2P for streaming VoD , 2012, 2012 IEEE 20th International Workshop on Quality of Service.

[2]  Peter Kilpatrick,et al.  Abstract only: IO performance prediction in consolidated virtualized environments , 2011 .

[3]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[4]  Vladimir Vlassov,et al.  ElastMan: autonomic elasticity manager for cloud-based key-value stores , 2013, HPDC.

[5]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[6]  Kedar Sovani Kernel korner: sleeping in the kernel , 2005 .

[7]  F. Al-Shamali,et al.  Author Biographies. , 2015, Journal of social work in disability & rehabilitation.

[8]  F. Schoenberg,et al.  Approximating the Distribution of Pareto Sums , 2003 .

[9]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[10]  Florian Schintke,et al.  Scalaris: reliable transactional p2p key/value store , 2008, ERLANG '08.

[11]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[12]  Cesar A. V. Melo,et al.  Performance Evaluation of an Object Management Policy Approach for P2P Networks , 2012, Int. J. Digit. Multim. Broadcast..

[13]  Michael I. Jordan,et al.  The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.

[14]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[15]  Jelena V. Misic,et al.  Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[16]  Hao Che,et al.  Hierarchical Web caching systems: modeling, design and experimental results , 2002, IEEE J. Sel. Areas Commun..

[17]  Harrick M. Vin,et al.  Design and performance tradeoffs in clustered video servers , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[18]  Philippe Robert,et al.  A versatile and accurate approximation for LRU cache performance , 2012, 2012 24th International Teletraffic Congress (ITC 24).

[19]  Gregory R. Ganger,et al.  Informed data distribution selection in a self-predicting storage system , 2006, 2006 IEEE International Conference on Autonomic Computing.

[20]  Rolf Stadler,et al.  Predicting response times for the Spotify backend , 2012, 2012 8th international conference on network and service management (cnsm) and 2012 workshop on systems virtualiztion management (svm).

[21]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[22]  Harry G. Perros,et al.  Service Performance and Analysis in Cloud Computing , 2009, 2009 Congress on Services - I.

[23]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[24]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[25]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[26]  A. Nur Zincir-Heywood,et al.  Understanding the performance of cooperative Web caching systems , 2005, 3rd Annual Communication Networks and Services Research Conference (CNSR'05).

[27]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[28]  Peter Kilpatrick,et al.  IO performance prediction in consolidated virtualized environments , 2011, ICPE '11.

[29]  Jianping Pan,et al.  Modeling and analysis of an expiration-based hierarchical caching system , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[30]  Yiping Chen,et al.  Home-Box-assisted content delivery network for Internet Video-on-Demand services , 2012, 2012 IEEE Symposium on Computers and Communications (ISCC).

[31]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[32]  S. Wittevrongel,et al.  Queueing Systems , 2019, Introduction to Stochastic Processes and Simulation.

[33]  Hiroshi Shigeno,et al.  Video-Popularity-Based Caching Scheme for P2P Video-on-Demand Streaming , 2011, 2011 IEEE International Conference on Advanced Information Networking and Applications.

[34]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[35]  Markus Klems,et al.  The Yahoo!: cloud datastore load balancer , 2012, CloudDB '12.

[36]  Gunnar Kreitz,et al.  Spotify -- Large Scale, Low Latency, P2P Music-on-Demand Streaming , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[37]  Kwok-Tung Lo,et al.  Investigating the Performance of Hierarchical Video-on-Demand System in Heterogeneous Environment , 2008, 2008 International Conference on Information Networking.

[38]  Jianping Pan,et al.  On expiration-based hierarchical caching systems , 2004, IEEE Journal on Selected Areas in Communications.