Coriolis: Scalable VM Clustering in Clouds

The growing popularity of virtualized data centers and clouds has led to virtual machine sprawl, significantly increasing system management costs. We present Coriolis, a scalable system that analyzes virtual machine images and automatically clusters them based on content and/or semantic similarity. Image similarity analysis can improve in planning many management activities (e.g., migration, system administration, VM placement) and reduce their execution cost. However, clustering images based on similarity – content or semantic – requires large scale data processing and does not scale well. Coriolis uses (i) asymmetric similarity semantics and (ii) a hierarchical clustering approachwith a data access requirement that is linear in the number of images. This represents a significant improvement over conventional clustering approaches that incur quadratic complexity and therefore becoming prohibitively expensive in a cloud setting.