DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models

The current trend in development of parallel programming models is to combine different well established models into a single programming model in order to support efficient implementation of a wide range of real world applications. The dataflow model has particularly managed to recapture the interest of the research community due to its ability to express parallelism efficiently. Thus, a number of recently proposed hybrid parallel programming models combine dataflow and traditional shared memory. Their findings have influenced the introduction of task dependency in the recently published OpenMP 4.0 standard. In this paper, we present DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models. DaSH features 11 benchmarks, each representing one of the Berkeley dwarfs that capture patterns of communication and computation common to a wide range of emerging applications. We also include sequential and shared-memory implementations based on OpenMP and TBB to facilitate easy comparison between hybrid dataflow implementations and traditional shared memory implementations based on work-sharing and/or tasks. Finally, we use DaSH to evaluate three different hybrid dataflow models, identify their advantages and shortcomings, and motivate further research on their characteristics.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Eduard Ayguadé,et al.  Integrating Dataflow Abstractions into the Shared Memory Model , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[3]  Satoshi Matsuoka,et al.  Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM , 2013, ISC.

[4]  M. Luján,et al.  Applying Dataflow and Transactions to Lee Routing , 2012 .

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  Elkin Garcia,et al.  TIDeFlow: The Time Iterated Dependency Flow Execution Model , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[7]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[8]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[9]  David E. Culler,et al.  Dataflow architectures , 1986 .

[10]  Dean M. Tullsen,et al.  Software data-triggered threads , 2012, OOPSLA '12.

[11]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[12]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[13]  Laxmikant V. Kalé,et al.  A study of memory-aware scheduling in message driven parallel programs , 2010, 2010 International Conference on High Performance Computing.

[14]  Butler W. Lampson,et al.  Annual Review of Computer Science , 1986 .

[15]  Paraskevas Evripidou,et al.  Chip multiprocessor based on data-driven multithreading model , 2007, Int. J. High Perform. Syst. Archit..

[16]  Ian Watson,et al.  DFScala: High Level Dataflow Support for Scala , 2012, 2012 Data-Flow Execution Models for Extreme Scale Computing.

[17]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[18]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.