Feedback control of server instances for right sizing in the cloud

We consider a computing system based on sum-moning server instances on the fly, possibly from a remote cloud service. A feedback rule must be designed to track the exogenous load with the right service capacity, taking into account the inherent lags in server creation and deletion. We use fluid and diffusion approximations of queueing models to analyze control schemes that manage the tradeoff between job queueing and idle capacity, in the large scale limit. In particular we propose a method in which the system can achieve negligible queueing while minimizing idle capacity. Theoretical results are supported by simulations.