An efficient and general implementation of futures on large scale shared-memory multiprocessors

This thesis describes a high-performance implementation technique for Multilisp's "future" parallelism construct. This method addresses the non-uniform memory access (NUMA) problem inherent in large scale shared-memory multiprocessors. The technique is based on lazy task creation (LTC), a dynamic task partitioning mechanism that dramatically reduces the cost of task creation and consequently makes it possible to exploit fine grain parallelism. In LTC, idle processors get work to do by "stealing" tasks from other processors. A previously proposed implementation of LTC is the shared-memory (SM) protocol. The main disadvantage of the SM protocol is that it requires the stack to be cached suboptimally on cache-incoherent machines. This thesis proposes a new implementation technique for LTC that allows full caching of the stack: the message-passing (MP) protocol. Idle processors ask for work by sending "work request" messages to other processors. After receiving such a message a processor checks its private stack and task queue and sends back a task if one is available. The message passing protocol has the added benefits of a lower task creation cost and simpler algorithms. Extensive experiments evaluate the performance of both protocols on large shared-memory multiprocessors: a 90 processor GP1000 and a 32 processor TC2000. The results show that the MP protocol is consistently better than the SM protocol. The difference in performance is as high as a factor of two when a cache is available and a factor of 1.2 when a cache is not available. In addition, the thesis shows that the semantics of the Multilisp language does not have to be impoverished to attain good performance. The laziness of LTC can be exploited to support at virtually no cost several programming features including: the Katz-Weise continuation semantics with legitimacy, dynamic scoping, and fairness.

[1]  David Callahan,et al.  A future-based parallel language for a general-purpose highly-parallel computer , 1990 .

[2]  Gary L. Peterson,et al.  Myths About the Mutual Exclusion Problem , 1981, Inf. Process. Lett..

[3]  Thomas L. Sterling,et al.  Concert: design of a multiprocessor development system , 1986, ISCA '86.

[4]  Richard P. Gabriel,et al.  Preliminary results with the initial implementation of Qlisp , 1988, LFP '88.

[5]  Christopher T. Haynes,et al.  Logic Continuations , 1986, J. Log. Program..

[6]  Michel Dubois,et al.  Memory Access Dependencies in Shared-Memory Multiprocessors , 1990, IEEE Trans. Software Eng..

[7]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[8]  Randy B. Osborne Speculative Computation in Multilisp , 1989, Workshop on Parallel Lisp.

[9]  Robert H. Halstead,et al.  Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.

[10]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[11]  Jonathan Rees,et al.  Object-oriented programming in scheme , 1988, LFP '88.

[12]  A.R. Newton,et al.  An empirical evaluation of two memory-efficient directory methods , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  Guillermo J Rozas A Computational Model for Observation in Quantum Mechanics , 1987 .

[14]  Leigh Stoller,et al.  Implementing concurrent scheme for the Mayfly distributed parallel processing system , 1992, LISP Symb. Comput..

[15]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[16]  佐藤 孝治 汎用計算機におけるQueue-based Multiprocessing Lispの実現 , 1989 .

[17]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[18]  Zhijing George Mou A formal model for divide-and-conquer and its parallel realization , 1990 .

[19]  Daniel Weise,et al.  Continuing into the future: on the interaction of futures and first-class continuations , 1990, LISP and Functional Programming.

[20]  D. H. Bartley,et al.  Revised4 report on the algorithmic language scheme , 1991, LIPO.

[21]  Mitchell Wand,et al.  Continuation-Based Program Transformation Strategies , 1980, JACM.

[22]  Robert H. Halstead,et al.  Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.

[23]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[24]  Robert R. Kessler,et al.  Concurrent Scheme , 1989, Workshop on Parallel Lisp.

[25]  Takayasu ITO,et al.  A Parallel Lisp Language PaiLisp and Its Kernel Specification , 1989, Workshop on Parallel Lisp.

[26]  Alexandru Nicolau,et al.  Adaptive Bitonic Sorting: An Optimal Parallel Algorithm for Shared-Memory Machines , 1989, SIAM J. Comput..

[27]  Robert H. Halstead,et al.  Overview of concert multilisp: a multiprocessor symbolic computing system , 1987, CARN.

[28]  James S. Miller,et al.  Free variables and first-class environments , 1991, LISP Symb. Comput..

[29]  Mitchell Wand,et al.  Continuations and coroutines , 1984, LFP '84.

[30]  Slocum Miller James,et al.  Multischeme : a parallel processing system based on MIT scheme , 1987 .

[31]  Randall Rettberg,et al.  The Monarch parallel processor hardware design , 1990, Computer.

[32]  Marc Feeley,et al.  A parallel virtual machine for efficient scheme compilation , 1990, LISP and Functional Programming.

[33]  Robert R. Kessler,et al.  An implementation of portable standard LISP on the BBN butterfly , 1988, LFP '88.

[34]  Corporate Ieee IEEE Std 1178-1990, IEEE Standard for the Scheme Programming Language , 1991 .

[35]  Daniel P. Friedman,et al.  Constraining control , 1985, POPL.

[36]  William D. Clinger,et al.  Revised3 report on the algorithmic language scheme , 1986, SIGP.

[37]  J. S. Weening Parallel execution of LISP programs , 1990 .

[38]  William D. Clinger The scheme 311 compiler an exercise in denotational semantics , 1984, LFP '84.

[39]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[40]  Eric Mohr Dynamic partitioning of parallel Lisp programs , 1992 .

[41]  James R. Larus,et al.  Features for Multiprocessing in SPUR Lisp , 1988 .

[42]  William D. Clinger,et al.  Implementation strategies for continuations , 1988, LFP '88.

[43]  Mitchell Wand,et al.  Essentials of programming languages , 2008 .

[44]  Robert Hieb,et al.  Representing control in the presence of first-class continuations , 1990, PLDI '90.

[45]  Carl Hewitt,et al.  The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.

[46]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[47]  Jr. Guy L. Steele,et al.  Rabbit: A Compiler for Scheme , 1978 .

[48]  Evangelos P. Markatos,et al.  Shared memory vs. message passing in shared-memory multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[49]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.