Thread Migration in a Parallel Graph Reducer

To support high level coordination, parallel functional languages need effective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if current work, or threads, are also distributed. Migrating a thread incurs significant execution cost and requires careful scheduling and an elaborate implementation. This paper describes the design, implementation and performance of thread migration in the GUM runtime system underlying Glasgow parallel Haskell (GPH). Measurements of nontrivial programs on a high-latency cluster architecture show that thread migration can improve the performance of data-parallel and divide-and-conquer programs with low processor utilisation. Thread migration also reduces the variation in performance results obtained in separate executions of a program. Moreover, migration does not incur significant overheads if there are no migratable threads, or on a single processor. However, for programs that already exhibit good processor utilisation, migration may increase performance variability and very occasionally reduce performance.

[1]  M.H.G. Kesseler,et al.  The implementation of functional languages on parallel machines with distributed memory , 1996 .

[2]  Hans-Wolfgang Loidl,et al.  Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer , 1996, Implementation of Functional Languages.

[3]  Simon Peyton Jones,et al.  Engineering parallel symbolic programs in GPH , 1999 .

[4]  Assaf Schuster,et al.  Thread migration and its applications in distributed shared memory systems , 1998, J. Syst. Softw..

[5]  P. Merkey,et al.  Beowulf: harnessing the power of parallelism in a pile-of-PCs , 1997, 1997 IEEE Aerospace Conference.

[6]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[7]  Simon Peyton Jones,et al.  Some Early Experiments on the GRIP Parallel Reducer , 1990 .

[8]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[9]  Edward Mascarenhas,et al.  Ariadne: Architecture of a Portable Threads System Supporting Thread Migration , 1996 .

[10]  Sven-Bodo Scholz,et al.  Experience with the Implementation of a Concurrent Graph Reduction System on an nCube/2 Platform , 1994, CONPAR.

[11]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[12]  Rishiyur S. Nikhil,et al.  Parallel Symbolic Computing in Cid , 1995, PSLS.

[13]  Joanne L. Martin Special Section International Parallel Processing Projects: A Software Perspective , 1985, IEEE Software.

[14]  David R. Lester,et al.  The HDG-Machine: A Highly Distributed Graph-Reducer for a Transputer Network , 1991, Comput. J..

[15]  Rita Loogen,et al.  GpH and Eden: Comparing two parallel functional languages on a Beowulf cluster , 2000, Scottish Functional Programming Workshop.

[16]  Gregory R. Andrews,et al.  Using Fine-Grain Threads and Run-Time Decision Making in Parallel Computing , 1996, J. Parallel Distributed Comput..

[17]  Rita Loogen,et al.  Eden - The Paradise of Functional Concurrent Programming , 1996, Euro-Par, Vol. I.

[18]  Hans-Wolfgang Loidl,et al.  Granularity in large-scale parallel functional programming , 1998 .

[19]  Florian Matthes,et al.  On Migrating Threads , 1997, Journal of Intelligent Information Systems.

[20]  Fred Douglis,et al.  Mobility: Processes, Computers, and Agents , 1999 .

[21]  Jeffrey S. Chase,et al.  The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.

[22]  Hans-Wolfgang Loidl,et al.  Algorithm + strategy = parallelism , 1998, Journal of Functional Programming.

[23]  Simon L. Peyton Jones,et al.  Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine , 1992, Journal of Functional Programming.

[24]  Simon L. Peyton Jones,et al.  GUM: a portable parallel implementation of Haskell , 1996, PLDI '96.

[25]  Hans-Wolfgang Loidl,et al.  Tuning Task Granularity and Data Locality of Data Parallel GPH Programs , 2001, Parallel Process. Lett..

[26]  Hans-Wolfgang Loidl,et al.  Implementing declarative parallel bottom-avoiding choice , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[27]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..