Predictive Job Scheduling under Uncertain Constraints in Cloud Computing

Capacity management has always been a great challenge for cloud platforms due to massive, heterogeneous on-demand instances running at different times. To better plan the capacity for the whole platform, a class of cloud computing instances have been released to collect computing demands beforehand. To use such instances, users are allowed to submit jobs to run for a pre-specified uninterrupted duration in a flexible range of time in the future with a discount compared to the normal ondemand instances. Proactively scheduling those pre-collected job requests considering the capacity status over the platform can greatly help balance the computing workloads along time. In this work, we formulate the scheduling problem for these precollected job requests under uncertain available capacity as a Prediction + Optimization problem with uncertainty in constraints, and propose an effective algorithm called Controlling under Uncertain Constraints (CUC), where the predicted capacity guides the optimization of job scheduling and job scheduling results are leveraged to improve the prediction of capacity through Bayesian optimization. The proposed formulation and solution are commonly applicable for proactively scheduling problems in cloud computing. Our extensive experiments on three public, industrial datasets shows that CUC has great potential for supporting high reliability in cloud platforms.

[1]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[2]  Gerhard J. Woeginger,et al.  A Review of Machine Scheduling: Complexity, Algorithms and Approximability , 1998 .

[3]  Leyuan Shi,et al.  IEEE Transactions on Automation Science and Engineering , 2009, IEEE Transactions on Automation Science and Engineering.

[4]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[5]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[6]  Shaowei Cai,et al.  Finding A Small Vertex Cover in Massive Sparse Graphs: Construct, Local Search, and Preprocess , 2017, J. Artif. Intell. Res..

[7]  Jian Cao,et al.  A Prediction Based Server Cluster Capacity Planning Strategy , 2018, 2018 IEEE International Conference on Progress in Informatics and Computing (PIC).

[8]  James Bailey,et al.  Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions , 2019, IJCAI.

[9]  Hang Lei,et al.  Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization , 2019 .

[10]  T. Moscibroda,et al.  Protean: VM Allocation Service at Scale , 2020, OSDI.

[11]  Servicio Geológico Colombiano Sgc Volume 4 , 2013, Journal of Diabetes Investigation.

[12]  Pu Zhao,et al.  Intelligent Virtual Machine Provisioning in Cloud Computing , 2020, IJCAI.

[13]  Diwakar Gupta,et al.  Appointment scheduling in health care: Challenges and opportunities , 2008 .

[14]  Marco L. Della Vedova,et al.  A methodological framework for cloud resource provisioning and scheduling of data parallel applications under uncertainty , 2019, Future Gener. Comput. Syst..

[15]  Amir Masoud Rahmani,et al.  Reliability and high availability in cloud computing environments: a reference roadmap , 2018, Human-centric Computing and Information Sciences.

[16]  J. Mockus,et al.  Bayesian approach to global optimization and application to multiobjective and constrained problems , 1991 .

[17]  Rubén Ruiz,et al.  Models and matheuristics for the unrelated parallel machine scheduling problem with additional resources , 2017, Eur. J. Oper. Res..

[18]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[19]  Peter M. Verderame,et al.  Planning and Scheduling under Uncertainty: A Review Across Multiple Sectors , 2010 .

[20]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.