Cache behavior of combinator graph reduction

The results of cache-simulation experiments with an abstract machine for reducing combinator graphs are presented. The abstract machine, called TIGRE, exhibits reduction rates that, for similar kinds of combinator graphs on similar kinds of hardware, compare favorably with previously reported techniques. Furthermore, TIGRE maps easily and efficiently onto standard computer architectures, particularly those that allow a restricted form of self-modifying code. This provides some indication that the conventional "stored program" organization of computer systems is not necessarily an inappropriate one for functional programming language implementations. This is not to say, however, that present day computer systems are well equipped to reduce combinator graphs. In particular, the behavior of the cache memory has a significant effect on performance. In order to study and quantify this effect, trace-driven cache simulations of a TIGRE graph reducer running on a reduced instruction-set computer are conducted. The results of these simulations are presented with the following hardware-cache parameters varied: cache size, block size, associativity, memory update policy, and write-allocation policy. To begin with, the cache organization of a commercially available system is used and then the performance sensitivity with respect to variations of each parameter are measured. From the results of the simulation study, a conclusion is made that combinator-graph reduction using TIGRE runs most efficiently when using a cache memory with an allocate-on-write-miss strategy, moderately large block size (preferably with subblock placement), and copy-back memory updates.

[1]  R. J. M. Hughes,et al.  Super-combinators a new implementation method for applicative languages , 1982, LFP '82.

[2]  Mark Scheevel NORMA: a graph reduction processor , 1986, LFP '86.

[3]  Simon L. Peyton Jones,et al.  The spineless G-machine , 1988, LISP and Functional Programming.

[4]  D. A. Turner,et al.  A new implementation technique for applicative languages , 1979, Softw. Pract. Exp..

[5]  Jean Vuillemin,et al.  Exact real computer arithmetic with continued fractions , 1988, IEEE Trans. Computers.

[6]  Simon L. Peyton Jones,et al.  The spineless tagless G-machine , 1989, FPCA.

[7]  Simon L. Peyton Jones,et al.  The Implementation of Functional Programming Languages , 1987 .

[8]  Lennart Augustsson,et al.  A compiler for lazy ML , 1984, LFP '84.

[9]  Simon Peyton Jones,et al.  The Implementation of Functional Programming Languages (Prentice-hall International Series in Computer Science) , 1987 .

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  W. R. Stoye The implementation of functional languages using custom hardware , 1985 .

[12]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA '84.

[13]  Simon L. Peyton Jones,et al.  GRIP - A high-performance architecture for parallel graph reduction , 1987, FPCA.

[14]  Chris Hankin,et al.  Abstract Interpretation of Declarative Languages , 1987 .

[15]  James R. Bell,et al.  Threaded code , 1973, CACM.

[16]  Simon L. Peyton Jones,et al.  FLIC—a functional language intermediate code , 1988, SIGP.

[17]  Jon Fairbairn,et al.  TIM: A simple, lazy abstract machine to execute supercombinatorics , 1987, FPCA.

[18]  Jr. Philip John Koopman An architecture for combinator graph reduction , 1990 .

[19]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[20]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[21]  Thomas Johnsson,et al.  Efficient compilation of lazy evaluation , 1984, SIGPLAN '84.

[22]  A. C. Norman Faster combinator reduction using stock hardware , 1988, LFP '88.

[23]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA 1984.

[24]  J. McKnight Fy69] R. Fenichel and J. Yochelson. a Lisp Garbage-collector for Virtual-memory Systems , 2008 .

[25]  Robert Fenichel,et al.  A LISP garbage-collector for virtual-memory computer systems , 1969, CACM.

[26]  Peter Lee,et al.  Cache performance of combinator graph reduction , 1990, Proceedings. 1990 International Conference on Computer Languages.

[27]  Peter Lee,et al.  A fresh look at combinator graph reduction , 1989, PLDI '89.

[28]  Mark Horowitz,et al.  Performance tradeoffs in cache design , 1988, ISCA '88.

[29]  N. S. Barnett,et al.  Private communication , 1969 .

[30]  Henk Barendregt,et al.  The Lambda Calculus: Its Syntax and Semantics , 1985 .

[31]  Jon Fairbairn,et al.  Non-Strict Languages - Programming and Implementation , 1989, Comput. J..

[32]  D. A. Turner Another Algorithm for Bracket Abstraction , 1979, J. Symb. Log..