Fast efficient simulation of write-buffer configurations

Write-buffers have a significant impact on performance, especially in wide-issue superscalar systems with write-through caching. We develop fast efficient simulation methods for evaluating multiple write-buffer configurations together in a single-pass. Our results are also applicable for the simulation of other buffer structures. We first consider simulating non-coalescing write-buffers. We show that a particular buffer stalls only when smaller buffers do, and develop an algorithm where only the smallest buffer is explicitly simulated, and the stales of others are updated only as smaller buffers stall. Empirical performance comparisons show a speedup of up to 7.4 over simpler methods. We then extend this algorithm to simulate multiple coalescing write buffers, where we demonstrate up to a factor of 3.5 speedup. Finally, we demonstrate the impact that write-buffers have on CPI by presenting write-buffer simulation results on four SPEC benchmarks.<<ETX>>