论文信息 - Please mind the gap between intra- and inter-chassis parallelism! (and how it can be closed)

Please mind the gap between intra- and inter-chassis parallelism! (and how it can be closed)

Programming abstractions to simplify distributed parallel computing have been widely adopted. Yet, intrachassis parallelism remains regarded as challenging, despite its often compelling performance advantages over distribution. We believe that—especially in the face of increasing architectural diversity inside the chassis [3]—benefits are to be had from programming a single machine using abstractions similar to those used for programming distributed systems. Multi-core operating systems have already realized this vision by using message-passing as a core primitive [1], but we argue that in large-scale parallel data processing, we can go further: in particular, only requiring the application programmer to write straight-line serial task code, and auto-parallelizing it transparently using the execution framework, which internally uses optimizations available inside a chassis, such as shared memory. The particular appeal of this approach lies in its generality: the same application code seamlessly scales out to clusters of machines. Recent work on task-parallel programming models has introduced abstractions permitting dynamic adaptation of task-parallel programs to their execution environments [2]. Based on this, we conjecture that expressing programs as dynamic task graphs achieves the generality we seek. Efficient support for such a model, however, requires an integration of OS-level and runtime resource management for task placement—which is feasible in a data-center environment. Our prototype system achieves comparable performance with a shared-memory implementation of the kmeans clustering algorithm when running inside a multi-core machine, while also scaling beyond the computational capacity of a single machine, facilitating multi-scale parallelism for the algorithm. BODY Existing, easy-to-use distributed programming models are sufficient to enable multiscale parallel computation inside and across machines.

Steven Hand | Derek Gordon Murray | Malte Schwarzkopf

[1] Steven Hand,et al. The case for reconfigurable I/O channels , 2012 .

[2] Adrian Schüpbach,et al. Your computer is already a distributed system. Why isn't your OS? , 2009, HotOS.

[3] Steven Hand,et al. Non-Deterministic Parallelism Considered Useful , 2011, HotOS.