Kahn Process Networks are a Flexible Alternative to MapReduce

Experience has shown that development using shared-memory concurrency, the  prevalent parallel programming paradigm today, is hard and synchronization  primitives nonintuitive because they are low-level and inherently  nondeterministic. To help developers, we propose Kahn process networks,  which are based on message-passing and shared-nothing model, as a simple and  flexible tool for modeling parallel applications.  We argue that they are  more flexible than MapReduce, which is widely recognized for its efficiency  and simplicity.  Nevertheless, Kahn process networks are equally intuitive  to use, and, indeed, MapReduce is implementable as a Kahn process network.  Our presented benchmarks (word count and k-means) show that a Kahn process  network framework permits alternative implementations that bring significant  performance advantages: the two programs run by a factor of up to $\sim 2.8$  (word-count) and $\sim 1.8$ (k-means) faster than their implementations for  Phoenix, which is a MapReduce framework specifically optimized for executing  on multicore machines.

[1]  Brian L. Evans,et al.  A Distributed Deadlock Detection and Resolution Algorithm for Process Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[3]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[4]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[5]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[6]  Edward A. Lee,et al.  Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II) , 2008 .

[7]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[8]  Edward A. Lee The problem with threads , 2006, Computer.

[9]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[12]  Twan Basten,et al.  Requirements on the Execution of Kahn Process Networks , 2003, ESOP.

[13]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.