Bridging the tenant-provider gap in cloud services

The disconnect between the resource-centric interface exposed by today's cloud providers and tenant goals hurts both entities. Tenants are encumbered by having to translate their performance and cost goals into the corresponding resource requirements, while providers suffer revenue loss due to un-informed resource selection by tenants. Instead, we argue for a "job-centric" cloud whereby tenants only specify high-level goals regarding their jobs and applications. To illustrate our ideas, we present Bazaar, a cloud framework offering a job-centric interface for data analytics applications. Bazaar allows tenants to express high-level goals and predicts the resources needed to achieve them. Since multiple resource combinations may achieve the same goal, Bazaar chooses the combination most suitable for the provider. Using large-scale simulations and deployment on a Hadoop cluster, we demonstrate that Bazaar enables a symbiotic tenant-provider relationship. Tenants achieve their performance goals. At the same time, holistic resource selection benefits providers in the form of increased goodput.

[1]  Rina Panigrahy,et al.  Validating Heuristics for Virtual Machines Consolidation , 2011 .

[2]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[3]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[4]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[5]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[6]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[7]  Nikolai Joukov,et al.  Operating system profiling via latency analysis , 2006, OSDI '06.

[8]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.

[9]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[10]  Edward Walker,et al.  Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[11]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Keke Chen,et al.  Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[14]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[15]  Helmut Krcmar,et al.  Generic performance prediction for ERP and SOA applications , 2011, ECIS.

[16]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17]  Sunita Mahajan,et al.  A Survey of Issues of Query Optimization in Parallel Databases , 2010 .

[18]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[19]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[22]  Jarek Gryz,et al.  A Survey of Query Optimization in Parallel Databases , 1999 .

[23]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[24]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[25]  Xiaowei Yang,et al.  CloudProphet: towards application performance prediction in cloud , 2011, SIGCOMM.

[26]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[27]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[28]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[29]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[30]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[31]  Xiaoyun Zhu,et al.  Triage: Performance differentiation for storage systems using adaptive control , 2005, TOS.

[32]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[33]  Serge A. Plotkin,et al.  Routing and admission control in general topology networks with Poisson arrivals , 1995, SODA '96.

[34]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[35]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[36]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[37]  Hitesh Ballani,et al.  Towards predictable datacenter networks , 2011, SIGCOMM 2011.

[38]  Gregory R. Ganger,et al.  Disks Are Like Snowflakes: No Two Are Alike , 2011, HotOS.

[39]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[40]  Pramod Bhatotia,et al.  Orchestrating the Deployment of Computations in the Cloud with Conductor , 2012, NSDI.