Optimized application placement for network congestion and failure resiliency in clouds

We propose OX, a runtime system that shields applications from network congestion and failures, in shared Cloud data centers. OX enables customers to deploy network intensive data analytics frameworks within existing infrastructures, by protecting co-hosted QoS-constrained applications from network interference and performance degradation. Moreover, OX reduces application vulnerability to hardware failures, such as rack power outages, for all applications. OX discovers application topologies by monitoring network traffic among application components (virtual machines), transparently. In addition, OX allows application owners to specify groups of highly available virtual machines, following component roles and replication semantics. Based on this information, OX builds on-line topology graphs for applications and incrementally partitions these graphs across the infrastructure to optimize communication between virtual machines and enforce availability constraints. We show the benefits of OX in a realistic shared Cloud data center setting using a mix of Hadoop and YCSB/Cassandra workloads.

[1]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[2]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[5]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[6]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[7]  Peter Desnoyers,et al.  Memory buddies: exploiting page sharing for smart colocation in virtualized data centers , 2009, VEE '09.

[8]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[9]  Ayhan Demiriz,et al.  Constrained K-Means Clustering , 2000 .

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[12]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[13]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[14]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[15]  Calton Pu,et al.  Improving Performance and Availability of Services Hosted on IaaS Clouds with Structural Constraint-Aware Virtual Machine Placement , 2011, 2011 IEEE International Conference on Services Computing.

[16]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.