Nebu: A topology-aware deployment system for reliable virtualized multi-cluster environments

Petabytes of data are processed daily by distributed applications built upon Hadoop and MongoDB. A significant fraction of these applications use cloud infrastructure to cope with this vast amount of data. Commercial clouds use virtualized environments, but most distributed applications are designed around the idea that they run on physical hardware. When this is no longer the case, guarantees for an application’s reliability and performance no longer hold. To remedy this issue, we design a powerful and comprehensive system called Nebu. Nebu is able to provide information about the physical topology of the cloud to the distributed application and is capable of automated virtual machine and application deployment. Nebu performs these tasks without depending on any single distributed application or virtual machine manager. Instead, Nebu provides efficient APIs that make it easy to provide compatibility with many popular distributed applications and virtual machine managers. We develop Nebu as an open source project using modern software engineering practices. In particular, we use the agile development method Scrum in combination with the Kanban scheduling system. We apply iterative API design through the use of RAML and supporting UML diagrams. Because no effective methods for testing distributed applications have been developed, we use both unit testing and manual testing. We also apply automated regression testing through the use of continuous integration. Because there are no formal guidelines on how to validate distributed applications for the kind we investigate in this work, we develop Nebu and perform real-world experiments with multiple distributed applications using an enterprise multi-cluster infrastructure. These experiments show that Nebu enables applications to give guarantees about reliability without degrading their performance. To increase Nebu’s usability, we provide extensions that offer compatibility with the distributed applications Hadoop and MongoDB, and virtual machine manager VMware. Both the system and the extensions to the system are developed in an enterprise environment. This holds good promise that Nebu will be adopted by open-source communities, as well as the industry.

[1]  Kiyoung Kim,et al.  MRBench: A Benchmark for MapReduce Framework , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[2]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[3]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[4]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[5]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[6]  Magdalena Balazinska,et al.  Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[7]  Joshua J. Bloch How to design a good API and why it matters , 2006, OOPSLA '06.

[8]  Mark Masse,et al.  REST API Design Rulebook , 2011 .

[9]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Zhengping Qian,et al.  MadLINQ: large-scale distributed matrix computation for the cloud , 2012, EuroSys '12.

[11]  Alexandru Iosup,et al.  The BTWorld use case for big data analytics: Description, MapReduce logical workflow, and empirical evaluation , 2013, 2013 IEEE International Conference on Big Data.

[12]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[13]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.