Lazy Threads: Implementing a Fast Parallel Call

In this paper, we describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.

[1]  Laurie Hendren,et al.  Early experiences with olden (parallel programming) , 1993 .

[2]  Chorus Systemes,et al.  Overview of the CHORUS? Distributed Operating Systems , 1991 .

[3]  J. Gregory Morrisett,et al.  Procs and locks: a portable multiprocessing platform for standard ML of New Jersey , 1993, PPOPP '93.

[4]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[5]  Seth Copen,et al.  ENABLING PRIMITIVES FOR COMPILING PARALLEL LANGUAGES , 1995 .

[6]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[7]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[8]  Takayasu Ito,et al.  Theory and Practice of Parallel Programming , 1995, Lecture Notes in Computer Science.

[9]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[10]  Anne Rogers,et al.  Early Experiences with Olden , 1993, LCPC.

[11]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[12]  Robert Hieb,et al.  Representing control in the presence of first-class continuations , 1990, PLDI '90.

[13]  Gregory R. Andrews,et al.  Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.

[14]  Craig Chambers,et al.  Debugging optimized code with dynamic deoptimization , 1992, PLDI '92.

[15]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[16]  Henry M. Levy,et al.  The performance of an object-oriented threads package , 1990, OOPSLA/ECOOP '90.

[17]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[18]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[19]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[20]  米沢 明憲 ABCL : an object-oriented concurrent system , 1990 .

[21]  David E. Culler,et al.  Dataflow architectures , 1986 .

[22]  Brad Calder,et al.  Leapfrogging: a portable technique for implementing efficient futures , 1993, PPOPP '93.

[23]  Calton Pu,et al.  Threads and input/output in the synthesis kernal , 1989, SOSP '89.

[24]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[25]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[26]  Satoshi Matsuoka,et al.  StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs , 1994, Theory and Practice of Parallel Programming.

[27]  Robert H. Halstead,et al.  Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.

[28]  Matthew Haines,et al.  An Overview of the Opus Language and Runtime System , 1994, LCPC.

[29]  Anne Rogers,et al.  Supporting SPMD Execution for Dynamic Data Structures , 1992, LCPC.

[30]  Matthew Haines,et al.  On the design of Chant: a talking threads package , 1994, Proceedings of Supercomputing '94.

[31]  Claude Kaiser,et al.  Overview of the CHORUS ® Distributed Operating Systems , 1991 .

[32]  E. U. Kriegel Gabriel, R. P., Performance and Evaluation of LISP Systems. Cambridge‐London, MIT Press 1985. XIV, 285 pp., $ 30.50. ISBN 0‐262‐07093‐6 (MIT Press Series in Computer Systems) , 1988 .

[33]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[34]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[35]  V. Karamcheti,et al.  Concert-efficient runtime support for concurrent object-oriented programming languages on stock hardware , 1993, Supercomputing '93.

[36]  Rishiyur S. Nikhil,et al.  Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.

[37]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[38]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[39]  Ifor Williams,et al.  The design and evaluation of a high-performance smalltalk system , 1988 .

[40]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[41]  Rishiyur S. Nikhil,et al.  A Multithreaded Implementation of Id using P-RISC Graphs , 1993, LCPC.

[42]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.