Efficient Compile-Time/ Run-Time Contraction of Fine Grain Data Parallel Codes

This research studies the contraction problem for data parallel languages. That is, we seek to efficiently execute code written for a fine grain virtual parallel machine on a much coarser grain actual parallel machine. This paper identifies issues involved in solving this problem. Three issues are then addressed in detail: efficiently implementing tasks to emulate virtual processors, efficiently scheduling these tasks, and reducing space overhead while maintaining data consistency among these tasks. In implementing tasks, saving a reduced register state at context switches is addressed. For scheduling tasks, heuristics are proposed that minimize scheduling cost and promote data locality. Minimizing space overhead using assumptions about the communication paradigm, run-time techniques and compile-time analysis is also discussed. Finally, experimental results are presented concerning one of the main issues discussed, scheduling. The results show that the proposed scheduling heuristics are both cost efficient and promote data locality for three different problems when compared to three other scheduling policies.

[1]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[2]  Guy L. Steele High performance Fortran: status report , 1993, SIGP.

[3]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[4]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[5]  Seth Copen Goldstein,et al.  Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993, ISCA '93.

[6]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[7]  Daniel P. Friedman,et al.  Engines build process abstractions , 1984, LFP '84.

[8]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..

[9]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[10]  J. Gregory Morrisett,et al.  Adding threads to Standard ML , 1990 .

[11]  Edward W. Felten,et al.  Improving the performance of message-passing applications by multithreading , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[12]  Robert P. Weaver,et al.  The DINO Parallel Programming Language , 1991, J. Parallel Distributed Comput..

[13]  Dirk Grunwald A users guide to awesime: an object oriented parallel programming and simulation system , 1991 .

[14]  Joel H. Saltz,et al.  Low Latency Messages on Distributed Memory Multiprocessors , 1995, Sci. Program..

[15]  Robert H. Halstead,et al.  A Syntactic Theory of Message Passing , 1980, JACM.

[16]  Alan Shaw,et al.  The logical design of operating systems , 1987 .

[17]  Butler W. Lampson,et al.  On the transfer of control between contexts , 1974, Symposium on Programming.

[18]  Mitchell Wand,et al.  Continuation-Based Multiprocessing , 1980, High. Order Symb. Comput..

[19]  John L. Hennessy,et al.  The priority-based coloring approach to register allocation , 1990, TOPL.

[20]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[21]  Brian N. Bershad,et al.  Using continuations to implement thread management and communication in operating systems , 1991, SOSP '91.

[22]  Piyush Mehrotra,et al.  Programming distributed memory architectures using Kali , 1990 .