A Minimalistic Dataflow Programming Library for Python

Current work on parallel programming models are trending towards the dataflow paradigm. Previous works on that topic have shown that dataflow programming is indeed a natural way to exploit parallelism in programs. However, there is still a gap in terms of ease of programming between high level languages adopted by the scientific community and the languages and tools available for dataflow programming. In this paper we present Sucuri: a minimalistic Python library that provides dataflow programming with reasonably simple syntax. To parallelize applications using our library, the programmer needs only to identify functions of his code that are good candidates for parallelization and instantiate a dataflow graph where each node is associated with one of such functions, and the edges between nodes describe data dependencies between functions. We then proceed to implement two benchmarks that represent important parallel programming patterns using our library and execute them on a cluster of multicores. Experimental results are promising, proving that our library can be an interesting first option for parallelization.

[1]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[2]  Avi Mendelson,et al.  TERAFLUX: Harnessing dataflow in next generation teradevices , 2014, Microprocess. Microsystems.

[3]  Leo Goodstadt,et al.  Ruffus: a lightweight Python library for computational pipelines , 2010, Bioinform..

[4]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[5]  Vítor Santos Costa,et al.  Trebuchet: exploring TLP with dataflow virtualisation , 2011, Int. J. High Perform. Syst. Archit..

[6]  Gurindar S. Sohi,et al.  Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  Patrick Crowley,et al.  Auto-pipe and the X language: a pipeline design tool and description language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[8]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[9]  G. vanRossum Python reference manual , 1995 .

[10]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[11]  Gurindar S. Sohi,et al.  Dataflow execution of sequential imperative programs on multicore architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .