论文信息 - Getting Started with Spark

Getting Started with Spark

Cluster computing has seen a rise in improved and popular computing models, in which clusters execute data-parallel computations on unreliable machines. This is enabled by software systems that provide locality-aware scheduling, fault tolerance, and load balancing. MapReduce [1] has become the front runner in pioneering this model, while systems like Map-Reduce-Merge [2] and Dryad [3] have generalized different data flow types. These systems are scalable and fault tolerant because they provide a programming model that enables users in creating acyclic data flow graphs to pass input data through a set of operations. This model enables the system to schedule and react to faults better without any user intervention. While this model can be applied to a lot applications, there are problems that cannot be solved efficiently by acyclic data flows.

K. G. Srinivasa | Anil Kumar Muppalla

[1] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..

[3] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[4] Miguel Castro,et al. Safe and efficient sharing of persistent objects in Thor , 1996, SIGMOD '96.

[5] Jinyang Li,et al. Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[6] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[7] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[8] James Frew,et al. Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[9] Willy Zwaenepoel,et al. Implementation and performance of Munin , 1991, SOSP '91.

[10] Kai Li,et al. IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[11] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.