Tolerating late memory traps in ILP processors

ILP processors can execute a large number of instructions at the same time. Thus it becomes more and more difficult to support traps efficiently. On the other hand a current trend in architecture is to support various memory functions in software rather than hardware, usually by trapping the execution processor on a cache miss, TLB miss or a failed access to a local or remote memory. These late memory traps block the faulty instruction at the top of the active list, backing up the pipeline. Moreover the support for late memory traps may affect the performance of non-faulting memory instructions as well. In this paper we analyze the overhead caused by late memory traps in ILP processors and define several measures for this overhead. In order to tolerate late memory traps, we propose hardware prefetching of exception conditions and a tagged store buffer to implement deferred traps on stores. We show that, with these hardware optimizations, the overhead added by the lateness of traps is significantly reduced relative to the overhead of early traps. Because of caching effects the frequency of late memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on the execution time becomes negligible.