PSather: Layered Extensions to an Object-Oriented Language for Efficient Parallel Computation

class $DIST{T} is -supertype of all distributed data structures to be used -with the dist-statement nr_of_chunks: INT; is_aligned_with(d:$DIST{$OB}): BOOL is -check alignment if nr_of_chunks=d.nr_of_chunks then loop res:=chunks!.where=d.chunks!.where; until!(res=false) end; end; end; iter chunks!:T; -iterate through all chunks in a given sequence iter chunks_on!(INT):T; -iterate through all chunks on a given cluster end; Figure 14: Abstract $DISTfTg class chunks on!: The iter chunks on! yields all chunks of a distributed data structure on a given cluster. It will be used by most implementations of the dist-statement. Note that the abstract class $DIST merely implements the minimal features needed by the diststatement. Concrete implementation of distributed objects provide a richer interface including operations to create, redistribute, add, and remove chunks from the distributed object. We will discuss a number of particularly useful implementations of distributed objects. One possible simple implementation of $DISTfTg uses ALISTfTg to implement the directory (Figure 15). This is actually the data structure as drawn in Figure 13. A more interesting implementation is built on top of SPREADfTg with subdirectories on each cluster (Figure 16). This implementation is optimized for distributed structures with chunks on all or most of the clusters and big data-parallel operations. Note that a dist-statement can be executed distributed without having the directory as a bottleneck because each cluster holds its part of the directory. It could also serve as a base class for multiple parallel access distributed structures. The distributed directory could prevent a directory access bottleneck. Even more simply we may provide a special class SIMPLE DIST (Figure 17) with at most one chunk per cluster. This is a very common case for many distributed objects. Note that is aligned with gets particularly simple if two SIMPLE DISTfTg objects are compared. 7.2 $DISTfTg and the dist Statement pSather has a dist-statement for data-parallel computation on objects that are subtypes of $DISTfTg. In this section we will rst introduce the syntax and the semantics of the dist-statement and show examples for the dist-statement in the next section. Syntactically the dist-statement is another block statement in pSather: dist stmt ) dist expr as ident (, expr as ident)* do stmt list end The dist statement executes its body in parallel body threads, one for each chunk of a given distributed object15. 15Since the code in the body of the dist-statement is executed sequentially, whereas the single body threads are executed in parallel to each other, the body code determines the granularity of parallelism in a dist-statement. 52 class LDIST{T}<$DIST{T} is include $DIST{T}; dir: ALIST{T}:=#ALIST{T}; add_chunk(c:T) is dir:=dir.push(c); end; nr_of_chunks:INT is res:=dir.size; end; iter chunks!:T is loop res:=dir.elts!; yield; end; end; iter chunks_on(cl:INT)!:T is loop res:=dir.elts!; if res.where=cl then yield; end; end; end; end; Figure 15: Simple implementation of $DISTfTg based on a list directory 53 class SDIST{T}<$DIST{T} is include SPREAD{ALIST{T}}; include $DIST{T}; add_chunk(c:T) is if [c.where]=void then [c.where]:=#ALIST{T}@c.where end; [c.where]:=[c.where].push(c)@c.where; end; nr_of_chunks:INT is loop res:=res+[CONFIG::clusters!].size@c; end; end; iter chunks!:T is loop c::=CONFIG::clusters!; loop res:=chunks_on(c); yield; end; end; end; iter chunks_on(cl:INT)!:T if [cl]/=void then loop res:=elts! end; end; end; end; Figure 16: Concrete implementation of $DISTfTg with a distributed directory 54 class SIMPLE_DIST{T}<$DIST{T} is include SPREAD{T}; include $DIST is_aligned_with($DIST{$OB}):BOOL -> o_is_aligned_with; nr_of_chunks:INT is loop if [CONFIG::clusters!]/=void then res:=res+1 end; end; end; is_aligned_with(d:$DIST{$OB}): BOOL is typecase d when SIMPLE_DIST{$OB} then res:=nr_of_chunks=d.nr_of_chunks else res:=o_is_aligned_with(d) end; end; iter chunks!:T is loop res::=[CONFIG::clusters!]; if res/=void then yield; end; end; end; iter chunks_on(cl:INT)!:T is res:=[cl]; if res/=void then yield; end; end; end; Figure 17: Specialized implementation of $DISTfTg with a single chunk per cluster 55 The as expression in the header of the statement de nes a chunk variable to relate distributed objects to variables referring to the corresponding chunks throughout the body of the dist-statement. The body thread is always executed on the cluster the corresponding chunk is located on. The purpose of this semantics is to bind parallel computation to the location of data, in order to exploit locality. It is possible to specify more than one distributed object in the header, if all the distributed objects are pairwise aligned; i.e. x.is aligned with(y) is true for every pair of x and y in the header. An exception of type ALIGNMENT ERROR is raised if the distributed objects are not all aligned16. All expressions before the as in the header must be of types descended from $DISTfTg. The chunk variables automatically get the type of the chunk, i.e. d as c, de nes a variable c of type T where d's type is of a subtype of $DISTfTg. plus(a:SAME):SAME -A new vector equal to self plus a. pre is_aligned_with(a) is -create result object res:=SAME::create; dist res as res_c, self as c, a as a_c do res_c.to_sum_of(c, a_c); end; end; Figure 18: Use of dist-statement in DVEC class Before continuing with details of the dist-statement semantics, Figure 18 shows an example from the DVEC class for distributed vectors, illustrating the practical use of the dist-statement17. The DVEC class for distributed vectors is a $DISTfTg-descendant with sequential vectors as chunks. This is an important construction principle. Many distributed classes are built on top of their sequential counterparts by distributing sequential objects with the same functionality as chunks of a distributed data structure. The plus routine adds two distributed vectors and creates a new one as the result. The precondition for addition is that both self and the argument a are aligned to each other. Alignment for distributed vectors has the generic semantics for distributed objects, requiring that two objects have the same number of chunks and chunks are located pairwise on the same cluster (cf. section 7.1). Furthermore the vector chunks are required to have the same dimension pairwise. After checking for alignment in the precondition, the routine plus continues by creating a new distributed vector of the same dimension as self for holding the sum. The actual computation is performed in the body of dist-statement over res, self, and a with chunk variables res c, c, and a c, respectively. Each body thread just uses the to sum of routine of ordinary vectors to sum over the single chunks. This is again a very typical pattern in distributed data structures: the distributed operation is just a distributed application of the ordinary operation. The dist-statement is also a scope for body local variables. Local variables of a dist-statement body and the chunk variables de ned in header are only visible within the body of the diststatement. There is one instance of each body local variable per body thread. The same instance of every variable in the surrounding scope is visible from the body threads of a dist-statement (i.e. local variables and parameters of the enclosing routine, self as an implicit 16Of course, it is always possible to refer to additional distributed objects within the body which are not aligned to the ones mentioned in the header. 17There are more examples from this class in section 7.3. 56 parameter and with it all attributes of the corresponding object, local variables of surrounding diststatements, shared and constant attributes). Assignments to all these di erent classes of variables (except constant attributes) are allowed from within body threads, but note that the general rules for atomicity and consistency of memory operations (cf. section 3.5) apply in the context of the diststatement under the assumption that the beginning and the end of a dist-statement correspond to implicit thread multi-fork and multi-join operations18. Thus, di erent body threads do not necessarily see a consistent picture of the shared variables unless they employ explicit synchronization operations like lock-statements around accesses to shared variables. Note, also, that the end of the dist-statement is a synchronization operation enforcing consistency in the sense of Section 3.5.2. Implementation note: A naive implementation of the above semantics could lead to a completely sequential execution of a dist-statement because of the memory bottleneck at the location of the local variables of the surrounding scope. An important optimization is to pass variables that are only read in the body by value to each body thread. All accesses to these variables are then local. Note, however, that only pointers are passed for reference objects. For local access to the objects we need to replicate the objects rst. For techniques to prevent read and write bottlenecks see sections 6 and 7.3. The dist-statement belongs to the group of structured statements which have to perform a termination action. In the case of the dist-statement, proper structure termination means waiting for all body threads to terminate. This is very similar to the cobegin-statement in Section 3.1. This leads to the following semantic rules: If an exception is passed on beyond the end19 of a dist-statement the statement waits for termination of all its body threads before passing the exception on to the next outer handler. If more than one body thread terminates exceptionally, only one of the exceptions is passed on beyond the end of the statement. One may look at the dist-statement as an implicit exception handler, that handles all exceptions by properly synchronizing all body threads and passing on one of the exceptions to the next outer handler. ret

[1]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[2]  Scott A. Smolka,et al.  Processes, Tasks, and Monitors: A Comparative Study of Concurrent Programming Primitives , 1983, IEEE Transactions on Software Engineering.

[3]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Makoto Imase,et al.  Gloabl Condtions in Debugging Distributed Programs , 1992, J. Parallel Distributed Comput..

[6]  Charles E. McDowell,et al.  Determining Possible Event Orders by Analyzing Sequential Traces , 1993, IEEE Trans. Parallel Distributed Syst..

[7]  Graem A. Ringwood,et al.  Collection Schemes for Distributed Garbage , 1992, IWMM.

[8]  Stephen M. Omohundro,et al.  Sather Iters: Object-Oriented Iteration Abstraction , 1993 .

[9]  David May,et al.  A Tutorial Introduction To Occam Programming , 1987 .

[10]  Jenq Kuen Lee,et al.  Object oriented parallel programming: experiments and results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[11]  Chu-cheow Lim,et al.  A Parallel Object-Oriented System for Realizing Reusable and Efficient Data Abstractions , 1993 .

[12]  Per Brinch Hansen,et al.  The programming language Concurrent Pascal , 1975, IEEE Transactions on Software Engineering.

[13]  Henri E. Bal,et al.  Orca: a language for distributed programming , 1990, SIGP.

[14]  C. A. R. Hoare,et al.  Monitors: an operating system structuring concept , 1974, CACM.

[15]  Jerome A. Feldman,et al.  High level programming for distributed computing , 1979, CACM.

[16]  Scott Shenker,et al.  Mostly parallel garbage collection , 1991, PLDI '91.

[17]  Mario Tokoro,et al.  Experience and evolution of concurrent Smalltalk , 1987, OOPSLA 1987.

[18]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[19]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[20]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[21]  David P. Anderson,et al.  Tarmac: a language system substrate based on mobile memory , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[22]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[23]  Thomas J. Leblanc,et al.  Analyzing Parallel Program Executions Using Multiple Views , 1990, J. Parallel Distributed Comput..

[24]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[25]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[26]  Thomas Rauber,et al.  The shared-memory language pSather on a distributed-memory multiprocessor , 1993, SIGP.

[27]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[28]  John G. P. Barnes An overview of Ada , 1980, Softw. Pract. Exp..

[29]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[30]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[31]  Jong-Deok Choi,et al.  Breakpoints and halting in distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[32]  Lawrence A. Crowl Architectural adaptability in parallel programming , 1991 .

[33]  David Lorge Parnas,et al.  Concurrent control with “readers” and “writers” , 1971, CACM.

[34]  Gregory R. Andrews,et al.  Concepts and Notations for Concurrent Programming , 1983, CSUR.

[35]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[36]  Jerome A. Feldman,et al.  PSather monitors: Design, Tutorial, Rationale and Implementation , 1991 .

[37]  Arvind,et al.  M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.

[38]  Andrew A Chien,et al.  Concurrent Aggregates (CA): an Object-Orinted Language for Fine- Grained Message-Passing Machines , 1990 .

[39]  Jenq Kuen Lee,et al.  On Using Object-Oriented Parallel Programming to Build Distributed Algebraic Abstractions , 1992, CONPAR.

[40]  Philip J. Hatcher,et al.  A production-quality C* compiler for Hypercube multicomputers , 1991, PPOPP '91.

[41]  D. Stott Parker,et al.  Saving traces for Ada debugging , 1985, SIGAda '85.

[42]  Jason Gait,et al.  A debugger for concurrent programs , 1985, Softw. Pract. Exp..

[43]  Charles E. McDowell,et al.  Debugging concurrent programs , 1989, ACM Comput. Surv..

[44]  Jean D. etc. Ichbiah Reference Manual for the ADA Programming Language , 1983 .

[45]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[46]  Hans-Juergen Boehm,et al.  Garbage collection in an uncooperative environment , 1988, Softw. Pract. Exp..

[47]  Edsger W. Dijkstra,et al.  Co-operating sequential processes , 1968 .

[48]  Steven Lucco,et al.  Parallel programming in a virtual object space , 1987, OOPSLA '87.

[49]  Oscar Nierstrasz,et al.  Active objects in hybrid , 1987, OOPSLA '87.

[50]  Guy L. Steele,et al.  Data Parallel Computers and the FORALL Statement , 1991, J. Parallel Distributed Comput..

[51]  John K. Bennett,et al.  The design and implementation of distributed Smalltalk , 1987, OOPSLA '87.

[52]  米沢 明憲 ABCL : an object-oriented concurrent system , 1990 .