Calibrating Resource Allocation for Parallel Processing of Analytic Tasks

Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. In this environment, any application needs a model of to achieve elasticity and the illusion of infinite capacity requires each of these resources to be virtualized to hide the implementation of how they are multiplexed and shared. Given the nature of parallel processing dynamic, how to assign numbers of servers, CPUs, cores to the tasks have great impacts to the resource utilization of a PaaS (Platform as a Service) provider. In this paper, we face the challenge in automated calibration of resource allocation for parallel processing of analytic tasks. The proposed framework does not assume availability of data statistics and application semantics but probeable tradeoff between parallelism benefits and overheads. To implement it, a Sampling-then-Calibrating algorithm is presented to sample the runtime statistic information and calibrate the resource allocation accordingly. The experiments validate effectiveness of our approach.