Environment-Sensitive Performance Tuning for Distributed Service Orchestration

Modern distributed systems are designed to tolerate unreliable environments, i.e., they aim to provide services even when some failures happen in the underlying hardware or network. However, the impact of unreliable environments can be significant on the performance of the distributed systems, which should be considered when deploying the services. In this paper, we present an approach to optimize performance of the distributed systems under unreliable deployed environments, through searching for optimal configuration parameters. To simulate an unreliable environment, we inject several failures in the environment of a service application, such as a node crash in the cluster, network failures between nodes, resource contention in nodes, etc. Then, we use a search algorithm to find the optimal parameters automatically in the user-selected parameter space, under the unreliable environment we created. We have implemented our approach in a testing-based framework and applied it to several well-known distributed service systems.

[1]  Alexander Schill,et al.  LARGE-SCALE TESTS OF DISTRIBUTED SYSTEMS WITH INTEGRATED EMULATION OF ADVANCED NETWORK BEHAVIOR , 2013 .

[2]  Peter M. Broadwell,et al.  FIG: A Prototype Tool for Online Verification of Recovery Mechanisms , 2002 .

[3]  Andrea C. Arpaci-Dusseau,et al.  FATE and DESTINI: A Framework for Cloud Recovery Testing , 2011, NSDI.

[4]  George Candea,et al.  Efficient Testing of Recovery Code Using Fault Injection , 2011, TOCS.

[5]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[6]  Farnam Jahanian,et al.  Experiments on six commercial TCP implementations using a software fault injection tool , 1997 .

[7]  Tao Ye,et al.  A recursive random search algorithm for large-scale network parameter configuration , 2003, SIGMETRICS '03.

[8]  Pallavi Joshi,et al.  SETSUDŌ: perturbation-based testing framework for scalable distributed systems , 2013, TRIOS@SOSP.

[9]  John Allspaw Fault injection in production , 2012, CACM.

[10]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[11]  Ian Molyneaux The Art of Application Performance Testing - Help for Programmers and Quality Assurance , 2009 .

[12]  Luigi Rizzo,et al.  Dummynet revisited , 2010, CCRV.

[13]  Ariel Tseitlin The Antifragile Organization , 2013, ACM Queue.

[14]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[15]  George Candea,et al.  Fast black-box testing of system recovery code , 2012, EuroSys '12.

[16]  Sébastien Tixeuil,et al.  FAIL-FCI: Versatile fault injection , 2007, Future Gener. Comput. Syst..

[17]  Koushik Sen,et al.  PREFAIL: a programmable tool for multiple-failure injection , 2011, OOPSLA '11.