论文信息 - Calibrating Resource Allocation for Parallel Processing of Analytic Tasks

Calibrating Resource Allocation for Parallel Processing of Analytic Tasks

Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. In this environment, any application needs a model of to achieve elasticity and the illusion of infinite capacity requires each of these resources to be virtualized to hide the implementation of how they are multiplexed and shared. Given the nature of parallel processing dynamic, how to assign numbers of servers, CPUs, cores to the tasks have great impacts to the resource utilization of a PaaS (Platform as a Service) provider. In this paper, we face the challenge in automated calibration of resource allocation for parallel processing of analytic tasks. The proposed framework does not assume availability of data statistics and application semantics but probeable tradeoff between parallelism benefits and overheads. To implement it, a Sampling-then-Calibrating algorithm is presented to sample the runtime statistic information and calibrate the resource allocation accordingly. The experiments validate effectiveness of our approach.

Wen-Syan Li | Jianfeng Yan

[1] Luc Bouganim,et al. Dynamic Load Balancing in Hierarchical Parallel Database Systems , 1996, VLDB.

[2] Erhard Rahm,et al. Dynamic Multi-Resource Load Balancing in Parallel Database Systems , 1995, VLDB.

[3] Randy H. Katz,et al. Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[4] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.

[5] Doug Johnson,et al. Computing in the Clouds. , 2010 .

[6] David J. DeWitt,et al. GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[7] Hidehiko Tanaka,et al. An Overview of The System Software of A Parallel Relational Database Machine GRACE , 1986, VLDB.

[8] Kenneth Ward Church,et al. On Delivering Embarrassingly Distributed Cloud Services , 2008, HotNets.

[9] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[10] Raghu Ramakrishnan,et al. Database Management Systems , 1976 .

[11] Chandra Krintz,et al. AppScale Design and Implementation , 2009 .

[12] Peter M. G. Apers,et al. Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB , 1992, VLDB.

[13] Archana Ganapathi,et al. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.