ZeroVM: secure distributed processing for big data analytics

A key challenge for any large-scale computation today, whether in “big data” or in handling large-scale web services, has to do with the management of data. In the big data context, the arbitrary separation of storage and computation increases latency and decreases performance. ZeroVM is a lightweight container-based virtualization platform that provides deterministic process execution and isolation. The philosophy behind ZeroVM is to virtualize applications then move the application to the data. This provides the ability to transform or process data in situ, rather than moving data to where the application is located. With the ability to move and execute application next to data, ZeroVM changes the conventional wisdom on infrastructure centric commuting models and enables even more data centric computing models to be used for Big-Data Analytics. The ZeroVM distributed processing framework proposed in this paper presents new opportunities for processing, storing and using data, particularly in big data analytics.

[1]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[2]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[3]  Úlfar Erlingsson,et al.  Language-independent sandboxing of just-in-time compilation and self-modifying code , 2011, PLDI '11.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[6]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[8]  Bennet S. Yee,et al.  Native Client: A Sandbox for Portable, Untrusted x86 Native Code , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[9]  Dejan S. Milojicic,et al.  OpenNebula: A Cloud Management Tool , 2011, IEEE Internet Computing.

[10]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[11]  Bennet S. Yee,et al.  Adapting Software Fault Isolation to Contemporary CPU Architectures , 2010, USENIX Security Symposium.

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[14]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.