Compilation and delayed evaluation in APL

Most existing APL implementations are interpretive in nature, that is, each time an APL statement is encountered it is executed by a body of code that is perfectly general, i.e. capable of evaluating any APL expression, and is in no way tailored to the statement on hand. This costly generality is said to be justified because APL variables are typeless and thus can vary arbitrarily in type, shape, and size during the execution of a program. What this argument overlooks is that the operational semantics of an APL statement are not modified by the varying storage requirements of its variables. The first proposal for a non fully interpretive implementation was the thesis of P. Abrams [1], in which a high level interpreter can defer performing certain operations by compiling code which a low level interpreter must later be called upon to execute. The benefit thus gained is that intelligence gathered from a wider context can be brought to bear on the evaluation of a subexpression. Thus on evaluating (A+B)[I], only the addition A[I]+B[I] will be performed. More recently, A. Perlis and several of his students at Yale [9,10] have presented a scheme by which a full-fledged APL compiler can be written. The compiled code generated can then be very efficiently executed on a specialized hardware processor. A similar scheme is used in the newly released HP/3000 APL [12]. This paper builds on and extends the above ideas in several directions. We start by studying in some depth the two key notions all this work has in common, namely compilation and delayed evaluation in the context of APL. By delayed evaluation we mean the strategy of deferring the computation of intermediate results until the moment they are needed. Thus large intermediate expressions are not built in storage; instead their elements are "streamed" in time. Delayed evaluation for APL was probably first proposed by Barton (see [8]). Many APL operators do not correspond to any real data operations. Instead their effect is to rename the elements of the array they act upon. A wide class of such operators, which we will call the grid selectors, can be handled by essentially pushing them down the expression tree and incorporating their effect into the leaf accessors. Semantically this is equivalent to the drag-along transformations described by Abrams. Performing this optimization will be shown to be an integral part of delayed evaluation. In order to focus our attention on the above issues, we make a number of simplifying assumptions. We confine our attention to code compilation for single APL expressions, such as might occur in an "APL Calculator", where user defined functions are not allowed. Of course we will be critically concerned with the re-usability of the compiled code for future evaluations. We also ignore the distinctions among the various APL primitive types and assume that all our arrays are of one uniform numeric type. We have studied the situation without these simplifying assumptions, but plan to report on this elsewhere. The following is a list of the main contributions of this paper. " We present an algorithm for incorporating the selector operators into the accessors for the leaves of the expression tree. The algorithm runs in time proportional to the size of the tree, as opposed to its path length (which is the case for the algorithms of [10] and [12]). Although arbitrary reshapes cannot be handled by the above algorithm, an especially important case can: that of a conforming reshape. The reshape AñB is called conforming if ñB is a suffix of A. " By using conforming reshapes we can eliminate inner and outer products from the expression tree and replace them with scalar operators and reductions along the last dimension. We do this by introducing appropriate selectors on the product arguments, then eventually absorbing these selectors into the leaf accessors. The same mechanism handles scalar extension, the convention of making scalar operands of scalar operators conform to arbitrary arrays. " Once products, scalar extensions, and selectors have been eliminated, what is left is an expression tree consisting entirely of scalar operators and reductions along the last dimension. As a consequence, during execution, the dimension currently being worked on obeys a strict stack-like discipline. This implies that we can generate extremely efficient code that is independent of the ranks of the arguments. Several APL operators use the elements of their operands several times. A pure delayed evaluation strategy would require multiple reevaluations. " We introduce a general buffering mechanism, called slicing, which allows portions of a subexpression that will be repeatedly needed to be saved, to avoid future recomputation. Slicing is well integrated with the evaluation on demand mechanism. For example, when operators that break the streaming are encountered, slicing is used to determine the minimum size buffer required between the order in which a subexpression can deliver its result, and the order in which the full expression needs it. " The compiled code is very efficient. A minimal number of loop variables is maintained and accessors are shared among as many expression atoms as possible. Finally, the code generated is well suited for execution by an ordinary minicomputer, such as a PDP-11, or a Data General Nova. We have implemented this compiler on the Alto computer at Xerox PARC. The plan of the paper is this: We start with a general discussion of compilation and delayed evaluation. Then we motivate the structures and algorithms we need to introduce by showing how to handle a wider and wider class of the primitive APL operators. We discuss various ways of tailoring an evaluator for a particular expression. Some of this tailoring is possible based only on the expression itself, while other optimizations require knowledge of the (sizes of) the atom bindings in the expression. The reader should always be alert to the kind of knowledge being used, for this affects the validity of the compiled code across reexecutions of a statement.

[1]  W. M. McKeeman An approach to computer language design , 1966 .

[2]  Philip S. Abrams,et al.  An APL machine , 1970 .

[3]  Aaron J. Goldberg,et al.  Smalltalk-72 instruction manual , 1976 .

[4]  Peter Henderson,et al.  A lazy evaluator , 1976, POPL.

[5]  Craig Schaffert,et al.  Abstraction mechanisms in CLU , 1977, Commun. ACM.

[6]  Andrei P. Ershov,et al.  On the Essence of Compilation , 1977, Formal Description of Programming Concepts.