Optimizing distributed actor systems for dynamic interactive services

Distributed actor systems are widely used for developing interactive scalable cloud services, such as social networks and on-line games. By modeling an application as a dynamic set of lightweight communicating "actors", developers can easily build complex distributed applications, while the underlying runtime system deals with low-level complexities of a distributed environment. We present ActOp---a data-driven, application-independent runtime mechanism for optimizing end-to-end service latency of actor-based distributed applications. ActOp targets the two dominant factors affecting latency: the overhead of remote inter-actor communications across servers, and the intra-server queuing delay. ActOp automatically identifies frequently communicating actors and migrates them to the same server transparently to the running application. The migration decisions are driven by a novel scalable distributed graph partitioning algorithm which does not rely on a single server to store the whole communication graph, thereby enabling efficient actor placement even for applications with rapidly changing graphs (e.g., chat services). Further, each server autonomously reduces the queuing delay by learning an internal queuing model and configuring threads according to instantaneous request rate and application demands. We prototype ActOp by integrating it with Orleans -- a popular open-source actor system [4, 13]. Experiments with realistic workloads show latency improvements of up to 75% for the 99th percentile, up to 63% for the mean, with up to 2x increase in peak system throughput.

[1]  David E. Culler,et al.  An architecture for highly concurrent, well-conditioned internet services , 2002 .

[2]  Hui Ding,et al.  TAO: how facebook serves the social graph , 2012, SIGMOD Conference.

[3]  David R. Karger,et al.  Koorde: A Simple Degree-Optimal Distributed Hash Table , 2003, IPTPS.

[4]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[5]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[6]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[7]  Harald Räcke,et al.  Optimal hierarchical decompositions for congestion minimization in networks , 2008, STOC.

[8]  Minor Gordon,et al.  Stage scheduling for CPU-intensive servers , 2010 .

[9]  Robert Krauthgamer,et al.  Partitioning graphs into balanced components , 2009, SODA.

[10]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[11]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[12]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[13]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[14]  John Zic,et al.  Auto-tune design and evaluation on staged event-driven architecture , 2006, MODDM '06.

[15]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2012, TNET.

[16]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[17]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[18]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[19]  Philip A. Bernstein,et al.  Orleans: Distributed Virtual Actors for Programmability and Scalability , 2014 .

[20]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[21]  Amir H. Payberah,et al.  JA-BE-JA: A Distributed Algorithm for Balanced Graph Partitioning , 2013, 2013 IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems.