Measuring Limits of Fine-grained Parallelism

First, the limits of low-level parallelism for a collection of real-world programs are investigated and Wall's results are veri ed. Maximum parallelism of symbolic benchmarks is investigated on both superpipelined and superscalar architectures. E ects of register renaming are considered. We then examine the e ects of garbage collection strategies on low-level parallelism. In particular, we examine whether a garbage collector should optimize for CAR or CDR accesses. We also consider another model where the CONS cells are split to make dereferencing fast, but pass the cost of accessing memory to the fetch of the address. We show that a traditional depthrst copying garbage collector would better reduce run time on a lowlevel parallel machine with speculative execution if it were to follow CDR links before CAR links. Simulations are our principal data gathering method. We have developed an instruction level MIPS R3000 simulator called mipsi. Mipsi emulates some of the kernel's functionality and thus is able to do a full simulation of a user-level process without having to simulate kernel code as well. By adding appropriate schedulers to mipsi, we can simulate very powerful parallel machines based on the MIPS chip. A step further, we can simulate aspects of the run-time system, in particular, the garbage collector, by simulating how much certain memory accesses would have cost had the garbage collector been implemented. Chapter