Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

[1]  Alok Choudhary,et al.  Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report , 2013 .

[2]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[3]  Zhen Li,et al.  Comet: a scalable coordination space for decentralized distributed environments , 2005, Second International Workshop on Hot Topics in Peer-to-Peer Systems.

[4]  Timothy G. Armstrong,et al.  ExM : High-level dataflow programming for extreme-scale systems , 2012 .

[5]  Robert D. Blumofe,et al.  Adaptive and Reliable ParallelComputing9 Networks of Workstations , 1997 .

[6]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[7]  PikeRob,et al.  Interpreting the data , 2005 .

[8]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[9]  Sriram Krishnamoorthy,et al.  Scioto: A Framework for Global-View Task Parallelism , 2008, 2008 37th International Conference on Parallel Processing.

[10]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[11]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[12]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Weijia Xu,et al.  Composing and executing parallel data-flow graphs with shell pipes , 2009, WORKS '09.

[14]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[15]  Carl Kesselman,et al.  What makes workflows work in an opportunistic environment? , 2006, Concurr. Comput. Pract. Exp..

[16]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[17]  Steven Hand,et al.  Scripting the Cloud with Skywriting , 2010, HotCloud.

[18]  Miron Livny,et al.  What makes workflows work in an opportunistic environmentq: Research Articles , 2006 .

[19]  Michael McCool,et al.  Structured parallel programming with deterministic patterns , 2010 .

[20]  Zhao Zhang,et al.  Towards Loo on , 2008 .

[21]  Brent B Welch,et al.  Practical Programming in Tcl and Tk , 1999 .

[22]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[24]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[25]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[26]  James Arthur Kohl,et al.  A Component Architecture for High-Performance Scientific Computing , 2006, Int. J. High Perform. Comput. Appl..

[27]  S. Dosanjh,et al.  Architectures and Technology for Extreme Scale Computing Report from the Workshop Node Architecture and Power Reduction Strategies , 2011 .

[28]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[29]  Justin M. Wozniak,et al.  Coasters: Uniform Resource Provisioning and Access for Clouds and Grids , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[30]  James A. Evans,et al.  Machine Science , 2010, Science.

[31]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[32]  Ewing Lusk,et al.  More scalability, less pain : A simple programming model and its implementation for extreme computing. , 2010 .

[33]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[34]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[35]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[36]  Stuart I. Feldman,et al.  Make — a program for maintaining computer programs , 1979, Softw. Pract. Exp..

[37]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[38]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[39]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[40]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.