The misconstrued semicolon: reconciling imperative languages and dataflow machines

APPLICATIONS In the early days of optimization, particular applications of ftow analysis and their relative merits were focal points of research. Later [Kild73] it was realized that many applications share some of the most important problems and the emphasis shifted to research in methods that can be useful for many applications. Such generalized analysis methods need some general notion of an application. 4.2. Existing Methods 59 Each How analysis problem can be seen as the association of a set of assertions concerning local properties with particular points in a program, and the propagation of this information through the program so that it can be checked for consistency or combined into more global assertions. We will consider an application to be a pair , where A is a set of assertions and P a set of propagation rules. Each assertion provides information about a particular property of a program and a propagation rule specifies the interaction between assertions. Assertions are associated with arcs and propagation rules with nodes. In the general case the propagation rules associated with a node with p incoming and q outgoing arcs is a function from Ap+q ~Ap+q. The inputs and outputs of the function are the old and new assertions associated with the arcs. Two special cases are distinguished. In a forward application the information ftows in the same direction as control and each propagation rule is a function from AP +q ~A '1. In a backward application the information flows in the opposite direction and each propagation rule is a function from AP +q ~AP. A solution oonsists of the association of a (final) assertion with each arc in the program satisfying all propagation rules. Not all solutions are good ones: a trivial solution for the problem illustrated in figure 4.2 could be the minimum assertion "The value of each variable is Unknown". The information contained in the initial assertions should not be lost, and to capture this notion a partial ordering is associated with the set of assertions and it is usually assumed that it forms at least a meet semilattice. This implies that there is a minimum assertion, which is implied by all other assertions and a meet operation, which extracts the information that two assertions have in common. A good solution is one which implies all initial assertions. It is desirable to obtain not just a good solution, but a maximum one, i.e. a good solution that is not implied by any other solution. One way of obtaining a solution is by propagating information through the graph. each time using the propagation rule of a node to update the assertions on the associated arcs, until a stable situation is reached. In such an iterative method only individual assertions are changed and the propagation rules remain untouched. If assertions are never replaced by smaller ones (guaranteed if all propagation rules are order-preserving) and if the assertion lattice is bounded, it is certain that a good solution will be reached. A maximum solution will be reached when the application is distributive (Kild73]. Other, so called elimination methods summarize the effect of a whole subgraph by replacing a set of propagation rules by a new one. These methods are usually faster than iterative methods, but the class of applications that they can handle is more restricted. The set of propagation rules has to be closed under functional composition and pointwise meet. Cycles present problems because the effect of unbounded paths must be expressible as a propagation rule and it must be computable in a bounded number of steps. Rosen and Graham&Wegman have investigated the minimum requirements that guarantee a good solution using such a method [Grah76, Rose80). 4.2. Existing Methods As indicated in the previous section, a flow analysis problem is solved in two steps. • Assertions and propagation rules are associated with certain points in the program. • Information is propagated through the program by combining assertions and/ or propagation rules into new ones until a stable situation is reached. The initial assertions and propagation rules describe the local effect of separate operations. This is trivial for atomic operations, but the local effect of a procedure call 60 4. Program Flow Analysis can only be determined through extensive analysis. Flow analysis that does not concern itself with the relationship between procedures is called intraprocedural, all other analysis is interprocedural. If interprocedural analysis is omitted a conservative approximation of the effect of a procedure call must be used, which limits the quality of the information that can be obtained. In the rest of this chapter strategies for interprocedural and intraprocedural analysis are discussed separately. 4.2.1. INTERPROCEDURAL ANALYSIS lnterprocedural analysis is an active area of research and we give only an indication of its problems rather than attempt to survey its present state. Important articles in this field are [Alle74, Bart78, Rose79]. A normal procedure call (i.e. not a coroutine call) consists of two transfers of control: from the calling to the called procedure and back to the calling procedure. These jumps are not independent, since a call will never be followed by a return to another procedure. One consequence of this is that not every path through the call graph (the graph that expresses calling relationships between procedures) is a valid control path. The challenge of interprocedural analysis is to exploit this information about the control flow patterns to obtain a better solution. A simple but expensive method is in line expansion: each procedure call is replaced by a copy of the procedure body and only the intraprocedural analysis of the root procedure is required. Its obvious drawbacks are that much analysis is duplicated and that recursion cannot be handled. A popular approach is to split the analysis into two phases. In the first phase a summary of the effect .of each procedure is constructed by a rough analysis of its body, ignoring any procedure calls. A transitive closure algorithm is then used to incorporate all direct and indirect procedure calls into the summaries. In the second phase the final analysis is performed using the summary information whenever a procedure call is encountered. The quality of the method depends on the quality of the information gathered in the first phase, which in turn is limited by the fact that the local effect of a procedure call is necessarily overestimated. In [Shar81] two methods are described which aim at removing this deficiency. The functional approach analyzes each procedure and expresses its effect in a set of relations between assertions at entry and exit points. Since these relations are interdependent, iteration is required to arrive at a fixed point. This method belongs to the elimination methods and is only useful for a restricted class of applications (see previous section). In the call string approach procedure call and return are treated as separate jumps, but an identification of each procedure call encountered during information propagation is tagged onto the propagated information. When a return is encountered this call string tag is used to select the correct control path. A generalization of both methods is described in [Jone82]. Most methods simplify the problem of interprocedural analysis by excluding th.ose language features that lead to serious complications. One complication is aliasing, which arises when different access paths (such as variable names) refer to the same object. It can occur if the language allows pointer values or call-by-reference parameters. A second complication arises when it is statically (i.e. during analysis) difficult to determine which procedure is being called. This can occur if the language allows variables or parameters to have procedures as values or when operators and procedure names are overloaded. An extensive treatment of these problems appears in [Weih80] where it is shown that obtaining precise information in the presence of procedure variables is P-Space hard. 4.2. Existing Methods 61 4.2.2. INTRAPROCEDURAL ANALYSIS The many strategies that have been proposed for flow analysis fall into groups distinguished by the level of program representation operated upon. It is still a matter of debate which level is most appropriate. The choice is between the source text, the generated code;. or any of the levels in between. Analysis of the source text always incorporates some form of lexical and syntactical analysis. Analysis of the generated code is the natural domain for machine dependent optimization; the work that has been done in this area is rather ad hoc and does not have much general applicability. Therefore, most general methods operate on some intermediate level. Ideal would be a representation in which all information that is not helpful for the analysis has been removed and all information that can be helpful is easily retrievable. Although many intermediate representations can be devised, two levels are of particular importance to flow analysis: • In a branch level representation the hierarchical structure of the program has been lost and the control flow is entirely encoded by jumps. An example is the representation in three address code, where each instruction corresponds to a typical machine instruction; the difference with assembly level is that register allocation has not yet been performed. Analysis methods that work on this level are called low level. • In a syntax level representation the program has the form. of a graph supplemented with tables. The graph is usually a tree such as a parse tree. The nesting of statements, which has an important influence on the analysis, is directly reflected in the graph structure, which is not cluttered by lay-out, variable names, and other details not relevant to flow analysis. Analysis methods that work on this level are called high level. As Rosen, who f

[1]  Jeffrey M. Barth A practical interprocedural data flow analysis algorithm , 1978, CACM.

[2]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[3]  Mark N. Wegman,et al.  A Fast and Usually Linear Algorithm for Global Flow Analysis , 1976, J. ACM.

[4]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[5]  John Darlington,et al.  ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages , 1981, FPCA '81.

[6]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[7]  Karl J. Ottenstein,et al.  A program form based on data dependency in predicate regions , 1983, POPL '83.

[8]  Keshav Pingali,et al.  Efficient demand-driven evaluation. II , 1983 .

[9]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[10]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[11]  Kim P. Gostelow,et al.  Performance of a Simulated Dataflow Computer , 1980, IEEE Transactions on Computers.

[12]  Ryuzo Hasegawa,et al.  A list-processing-oriented data flow machine architecture , 1982, AFIPS '82.

[13]  Paul Klint,et al.  An overview of the SUMMER programming language , 1980, POPL '80.

[14]  Jack B. Dennis,et al.  Programming generality, parallelism and computer architecture , 1968, IFIP Congress.

[15]  A. Davis A data flow evaluation system based on the concept of recursive locality* , 1979, 1979 International Workshop on Managing Requirements Knowledge (MARK).

[16]  Robert E. Tarjan,et al.  Fast Algorithms for Solving Path Problems , 1981, JACM.

[17]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[18]  Philippe Flajolet,et al.  The Average Height of Binary Trees and Other Simple Trees , 1982, J. Comput. Syst. Sci..

[19]  William E. Weihl,et al.  Interprocedural data flow analysis in the presence of pointers, procedure variables, and label variables , 1980, POPL '80.

[20]  Forbes J. Burkowski A multi-user data flow architecture , 1981, ISCA '81.

[21]  John Glauert,et al.  SISAL: streams and iteration in a single-assignment language. Language reference manual, Version 1. 1 , 1983 .

[22]  Arvind,et al.  A Computer Capable of Exchanging Processors for Time , 1977, IFIP Congress.

[23]  Jack B. Dennis,et al.  Building blocks for data flow prototypes , 1980, ISCA '80.

[24]  Amitava Hazra A description method and a classification scheme for data flow architectures , 1982, ICDCS.

[25]  Barry K. Rosen Data Flow Analysis for Procedural Languages , 1979, JACM.

[26]  Alfred V. Aho,et al.  Node listings for reducible flow graphs , 1975, STOC '75.

[27]  John R. Gurd,et al.  Generation of dataflow graphical object code for the Lapse programming language , 1981, CONPAR.

[28]  Ian Watson,et al.  Preliminary Evaluation of a Prototype Dataflow Computer , 1983, IFIP Congress.

[29]  Ken Kennedy,et al.  An algorithm for reduction of operator strength , 1977, Commun. ACM.

[30]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[31]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .