论文信息 - Nebula: Distributed Edge Cloud for Data Intensive Computing

Nebula: Distributed Edge Cloud for Data Intensive Computing

Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.

[1] Neha Narula,et al. Native Client: A Sandbox for Portable, Untrusted x86 Native Code , 2009, IEEE Symposium on Security and Privacy.

[2] Margo I. Seltzer,et al. Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3] GhemawatSanjay,et al. The Google file system , 2003 .

[4] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.

[5] Abhishek Chandra,et al. Awan: Locality-Aware Resource Manager for Geo-Distributed Data-Intensive Applications , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[6] Ben Y. Zhao,et al. OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[7] Aditya Akella,et al. CLARINET: WAN-Aware Optimization for Analytics Queries , 2016, OSDI.

[8] Carlo Curino,et al. WANalytics: Analytics for a Geo-Distributed Data-Intensive World , 2015, CIDR.

[9] Krishna P. Gummadi,et al. An analysis of Internet content delivery systems , 2002, OPSR.

[10] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[11] Christopher Frost,et al. Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[12] Raja Lavanya,et al. Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[13] David P. Anderson,et al. BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[14] Paramvir Bahl,et al. Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[15] Olaf Maennel,et al. Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication , 2015, SIGCOMM.

[16] Ada Gavrilovska,et al. Cloud4Home -- Enhancing Data Services with @Home Clouds , 2011, 2011 31st International Conference on Distributed Computing Systems.

[17] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18] I. Foster,et al. The Physiology of the Grid , 2003 .

[19] Yuan Luo,et al. Hierarchical MapReduce Programming Model and Scheduling Algorithms , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[20] Abhishek Chandra,et al. Ridge: combining reliability and performance in open grid platforms , 2007, HPDC '07.

[21] Andreas Haeberlen,et al. Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[22] Robert Tappan Morris,et al. Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[23] Stefan Savage,et al. Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[24] Patrick Th. Eugster,et al. From the Cloud to the Atmosphere: Running MapReduce across Data Centers , 2014, IEEE Transactions on Computers.

[25] Wu-chun Feng,et al. MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.

[26] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[28] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[29] B. Cohen,et al. Incentives Build Robustness in Bit-Torrent , 2003 .

[30] Abhishek Chandra,et al. Cross-Phase Optimization in MapReduce , 2014, Cloud Computing for Data-Intensive Applications.

[31] Robert Tappan Morris,et al. Practical, distributed network coordinates , 2004, Comput. Commun. Rev..

[32] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[33] Margarida Mamede,et al. PIXIDA: Optimizing Data Parallel Jobs in Wide-Area Data Analytics , 2015, Proc. VLDB Endow..

[34] David P. Anderson,et al. SETI@home: an experiment in public-resource computing , 2002, CACM.

[35] Michael Dahlin,et al. Volunteer Cloud Computing: MapReduce over the Internet , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[36] David E. Culler,et al. PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[37] Paramvir Bahl,et al. The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[38] Miron Livny,et al. Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[39] Ethan Katz-Bassett,et al. SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[40] Michael J. Freedman,et al. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.

[41] Muthu Dayalan,et al. MapReduce : Simplified Data Processing on Large Cluster , 2018 .