The Nornir Run-time System for Parallel Programs Using Kahn Process Networks

Shared-memory concurrency is the prevalent paradigm used for developing parallel applications targeted towards small- and middle-sized machines, but experience has shown that it is hard to use. This is largely caused by synchronization primitives which are low-level, inherently nondeterministic, and, consequently, non-intuitive to use. In this paper, we present the \textit{Nornir} run-time system. Nornir is comparable to well-known frameworks like MapReduce and Dryad, but has additional support for process structures containing cycles. It is based on the formalism of Kahn process networks, which we deem as a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that performance in most cases improves almost linearly with the number of CPUs, when not limited by data dependencies.

[1]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[2]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[3]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[4]  Carsten Griwodz,et al.  Kahn Process Networks are a Flexible Alternative to MapReduce , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[5]  Twan Basten,et al.  Requirements on the Execution of Kahn Process Networks , 2003, ESOP.

[6]  Brian L. Evans,et al.  A Distributed Deadlock Detection and Resolution Algorithm for Process Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Andy D. Pimentel,et al.  Towards Multi-application Workload Modeling in Sesame for System-Level Design Space Exploration , 2007, SAMOS.

[9]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.

[10]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[11]  Ümit V. Çatalyürek,et al.  Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[13]  Edward A. Lee,et al.  Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II) , 2008 .

[14]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Carsten Griwodz,et al.  Evaluating the Run-Time Performance of Kahn Process Network Implementation Techniques on Shared-Memory Multiprocessors , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.

[16]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[17]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[18]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.