StackThreads/MP: Integrating Futures into Calling Standards AUTHORS

An implementation scheme of ne-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by o -the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. The system practically works with unmodi ed GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clari ed and claimed modest. Experiments indicate that sequential performance is not sacri ced for practical applications and both sequential and parallel performance are comparable to Cilk[10], whose current implementation requires a fairly sophisticated preprocessor to C. These results show that e cient asynchronous calls (a.k.a. future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering. ANY OTHER IDENTIFYING INFORMATION OF THIS REPORT A summary of this report [28] has been published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '99). Updates are available from http://www.yl.is.s.u-tokyo.ac.jp/sthreads/. DISTRIBUTION STATEMENT First issue 35 copies. SUPPLEMENTARY NOTES REPORT DATE February 1999 TOTAL NO. OF PAGES 33 WRITTEN LANGUAGE English NO. OF REFERENCES 30 DEPARTMENT OF INFORMATION SCIENCE Faculty of Science, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan StackThreads/MP: Integrating Futures into Calling Standards Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa ftau,tabata,yonezawag@is.s.u-tokyo.ac.jp

[1]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[2]  C. Leiserson,et al.  Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[3]  Akinori Yonezawa,et al.  An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus , 1997, Euro-Par.

[4]  Xingbin Zhang,et al.  A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[5]  Luca Cardelli,et al.  Modula-3 Report (revised) , 1992 .

[6]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[7]  Andrew A. Chien,et al.  ICC++—a C++ dialect for high performance parallel computing , 1996, SIAP.

[8]  Kazunori Ueda,et al.  Design of the Kernel Language for the Parallel Inference Machine , 1990, Computer/law journal.

[9]  Devang Shah,et al.  Programming with threads , 1996 .

[10]  Seth Copen Goldstein,et al.  Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..

[11]  Akinori Yonezawa,et al.  Fine-grain multithreading with minimal compiler support—a cost effective approach to implementing efficient multithreading languages , 1997, PLDI '97.

[12]  Rishiyur S. Nikhil Arvind,et al.  Id: a language with implicit parallelism , 1992 .

[13]  Marc Feeley,et al.  A Message Passing Implementation of Lazy Task Creation , 1992, Parallel Symbolic Computing.

[14]  Akinori Yonezawa,et al.  Object-oriented concurrent programming ABCL/1 , 1986, OOPLSA '86.

[15]  Richard M. Stallman,et al.  Using and Porting GNU CC , 1998 .

[16]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[17]  Akinori Yonezawa,et al.  Schematic: A Concurrent Object-Oriented Extension to Scheme , 1995, OBPDC.

[18]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[19]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[20]  Seth Copen Goldstein,et al.  Separation constraint partitioning: a new algorithm for partitioning non-strict programs into sequential threads , 1995, POPL '95.

[21]  Satoshi Matsuoka,et al.  StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs , 1994, Theory and Practice of Parallel Programming.

[22]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[23]  Eric Mohr Dynamic partitioning of parallel Lisp programs , 1992 .