New implementation of a BSP composition primitive with application to the implementation of algorithmic skeletons

BSML is a ML based language designed to code Bulk Synchronous Parallel (BSP) algorithms. It allows an estimation of execution time, avoids deadlocks and non-determinism. BSML proposes an extension of ML programming with a small set of primitives. One of these primitives, called parallel superposition, allows the parallel composition of two BSP programs. Nevertheless, its past implementation used system threads and have unjustified limitations. This paper presents a new implementation of this primitive based on a continuation-passing-style (CPS) transformation guided by a flow analysis. To test it and show its usefulness, we also have implemented the OCamlP3l algorithmic skeletons and compared their efficiencies with the original ones.

[1]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[2]  Sergei Gorlatch,et al.  Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.

[3]  Frédéric Gava,et al.  BSP Functional Programming: Examples of a Cost Based Methodology , 2008, ICCS.

[4]  Olaf Bonorden,et al.  The Paderborn University BSP (PUB) library , 2003, Parallel Comput..

[5]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[6]  Martin Alt,et al.  Using algorithmic skeletons for efficient grid computing with predictable performance , 2007 .

[7]  Jesse Fang,et al.  A Flexible Parallel Programming Model for Tera-scale Architectures Table of Contents , 2007 .

[8]  Vincent Martin,et al.  Domain decomposition and skeleton programming with OCamlP3l , 2005, Parallel Comput..

[9]  Francisco Argüello,et al.  A Data-Parallel Formulation for Divide and Conquer Algorithms , 2001, Comput. J..

[10]  Konrad Hinsen,et al.  Parallel Scripting with Python , 2007, Computing in Science & Engineering.

[11]  Peter Lee,et al.  Safe-for-Space Threads in Standard ML , 1998, High. Order Symb. Comput..

[12]  Ami Marowka,et al.  Parallel Scientific Computation: A Structured Approach using BSP and MPI , 2006, Scalable Comput. Pract. Exp..

[13]  Mitchell Wand,et al.  Continuation-Based Multiprocessing , 1980, High. Order Symb. Comput..

[14]  Hayo Thielecke,et al.  From control effects to typed continuation passing , 2003, POPL '03.

[15]  Olivier Danvy,et al.  CPS transformation of beta-redexes , 2000, Inf. Process. Lett..

[16]  Christoph Armin Herrmann,et al.  Generating Message-passing Programs from Abstract Specifications by Partial Evaluation , 2005, Parallel Process. Lett..

[17]  Olin Shivers Continuations and threads: Expressing machine concurrency directly in advanced languages , 1997 .

[18]  Frédéric Loulergue,et al.  Parallel Superposition for Bulk Synchronous Parallel ML , 2003, International Conference on Computational Science.

[19]  Alexander Tiskin,et al.  A New Way to Divide and Conquer , 2001, Parallel Process. Lett..

[20]  Bu-Sung Lee,et al.  JBSP: A BSP Programming Library in Java , 2001, J. Parallel Distributed Comput..

[21]  Gordon D. Plotkin,et al.  Call-by-Name, Call-by-Value and the lambda-Calculus , 1975, Theor. Comput. Sci..

[22]  Frédéric Loulergue,et al.  Bulk synchronous parallel ML with exceptions , 2006, Future Gener. Comput. Syst..

[23]  Philip Wadler,et al.  Monads and composable continuations , 1994, LISP Symb. Comput..

[24]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[25]  Frédéric Gava Implementation of the Parallel Superposition in Bulk-Synchronous Parallel ML , 2007, International Conference on Computational Science.

[26]  Frédéric Loulergue,et al.  A static analysis for Bulk Synchronous Parallel ML to avoid parallel nesting , 2005, Future Gener. Comput. Syst..

[27]  Julien Signoles Calcul statique des applications de modules paramétrés , 2003, JFLA.

[28]  Kwangkeun Yi Interconnecting Between CPS Terms and Non-CPS Terms , 2001 .

[29]  Frédéric Gava A Modular Implementation of Data Structures in Bulk-Synchronous Parallel ML , 2008, Parallel Process. Lett..

[30]  Edward A. Lee The problem with threads , 2006, Computer.

[31]  Sriram Srinivasan,et al.  A Thread of One's Own , 2006 .

[32]  Frédéric Gava,et al.  New implementation of a parallel composition primitive for a functionnal BSP language , 2008 .

[33]  Friedhelm Meyer auf der Heide,et al.  A Web Computing Environment for Parallel Algorithms in Java , 2001, Scalable Comput. Pract. Exp..

[34]  Anwar Ghuloum Ct: channelling NeSL and SISAL in C++ , 2007, CUFP '07.

[35]  Frédéric Gava External Memory in Bulk-Synchronous Parallel ML , 2005, Scalable Comput. Pract. Exp..

[36]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[37]  Juliusz Chroboczek,et al.  Continuation Passing for C A space-efficient implementation of concurrency , 2006 .

[38]  Rob H. Bisseling,et al.  Parallel Scientific Computation , 2004 .

[39]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..

[40]  Nevin Heintze Control-Flow Analysis and Type Systems , 1995, SAS.

[41]  Xavier Leroy,et al.  Mechanized Verification of CPS Transformations , 2007, LPAR.

[42]  Mostafa Bamha,et al.  Pipelining a Skew-Insensitive Parallel Join Algorithm , 2003, Parallel Process. Lett..