This paper describes the design and evaluation of an auto-memoization processor. The major point of this proposal is to detect the multilevel functions and loops with no additional instructions controlled by the compiler. This general purpose processor detects the functions and loops, and memoizes them automatically and dynamically. Hence, any load modules and binary programs can gain speedup without recompilation or rewriting.
We also propose a parallel execution by multiple speculative cores and one main memoing core. While main core executes a memoizable region, speculative cores execute the same region simultaneously. The speculative execution uses predicted inputs. This can omit the execution of instruction regions whose inputs show monotonous increase or decrease, and may effectively use surplus cores in coming many-core era.
The result of the experiment with GENEsYs: genetic algorithm programs shows that our auto-memoization processor gains significantly large speedup, up to 7.1-fold and 1.6-fold on average. Another result with SPEC CPU95 suite benchmarks shows that the auto-memoization with three speculative cores achieves up to 2.9-fold speedup for 102.swim and 1.4-fold on average. It also shows that the parallel execution by speculative cores reduces cache misses just like pre-fetching.
[1]
G.S. Sohi,et al.
Dynamic instruction reuse
,
1997,
ISCA '97.
[2]
Youfeng Wu,et al.
Better exploration of region-level value locality with integrated computation reuse and value prediction
,
2001,
Proceedings 28th Annual International Symposium on Computer Architecture.
[3]
Antonio González,et al.
Trace-level reuse
,
1999,
Proceedings of the 1999 International Conference on Parallel Processing.
[4]
Antonio González,et al.
Trace-level speculative multithreaded architecture
,
2002,
Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[5]
Felipe Maia Galvão França,et al.
The dynamic trace memoization reuse technique
,
2000,
Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[6]
Peter Norvig,et al.
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
,
1991
.