Distributed Parameter Optimization Scheduling Strategy and System Design Based on Mesos

This paper implements a distributed parameter optimization system based on Mesos and studies the scheduling strategy of this system. Using the resource interface of Mesos, the system packages a variety of common parameter optimization algorithms and task scheduling strategy into a framework software that can run on Mesos. Aiming at the two-level scheduling mechanism of Mesos, a dynamic scheduling strategy for distributed parameter optimization system in multi-job environment on hybrid deployment cluster is proposed. This paper designs several experiments, and compares the resource scheduling strategy of the architecture software with the FIFO scheduling strategy in the hybrid deployment scenario. This work reduces the difficulty of optimizing distributed parameters in common scenarios such as deep learning in a cluster environment, and improves resource utilization efficiency in multi-task environment.

[1]  Robert Thurlow,et al.  RPC: Remote Procedure Call Protocol Specification Version 2 , 2009, RFC.

[2]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[3]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[4]  Kirthevasan Kandasamy,et al.  Asynchronous Parallel Bayesian Optimisation via Thompson Sampling , 2017, ArXiv.

[5]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Raj Srinivasan,et al.  RPC: Remote Procedure Call Protocol Specification Version 2 , 1995, RFC.

[8]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[9]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[12]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[13]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[14]  William Gropp,et al.  The MPI Message-Passing Interface Standard: Overview and Status , 1995 .

[15]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[16]  Meikang Qiu,et al.  Resource allocation robustness in multi-core embedded systems with inaccurate information , 2011, J. Syst. Archit..

[17]  Martin Pelikan,et al.  Bayesian Optimization Algorithm , 2005 .

[18]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[19]  Eric A. Brewer,et al.  Kubernetes and the path to cloud native , 2015, SoCC.

[20]  Anand Sivasubramaniam,et al.  Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms , 2001, IEEE Trans. Parallel Distributed Syst..