The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture

Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit more thread-level parallelism. The instructionand thread-level speculative execution of load instructions in a multithreaded architecture system has a greater impact on the performance of the cache hierarchy as the design becomes more aggressive using wider issue processors and more thread units. In this study, we investigate the effects of executing the mispredicted load instructions on the cache performance of a scalable multithreaded computer system. The execution of loads down the wrongly predicted branch path within a thread unit or in a wrongly forked thread can result in an indirect prefetching effect for correct execution. This is possible even after the outcome of a control speculation is known. By allowing mispredicted load instructions to continue execution even after the instruction or thread level control speculation is known to have failed, we show that we can reduce the cache misses for the correctly predicted paths and threads. However, these additional loads also can increase the amount of memory traffic and can pollute the cache. Our results show that the performance of a concurrent multithreaded architecture can be improved as much as 14%, while reducing the number of L1 data cache misses up to 35%.

[1]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[2]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[3]  David J. Lilja,et al.  Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions , 2002, Euro-Par.

[4]  David J. Lilja Measuring Computer Performance , 2000 .

[5]  Jenn-Yuan Tsai,et al.  Performance study of a concurrent multithreaded processor , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[6]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[7]  Todd C. Mowry,et al.  The Potential for Thread-level Data Speculation in Tightly-coupled Multiprocessors , 1997 .

[8]  Trevor N. Mudge,et al.  Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[9]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[10]  Theo Ungerer,et al.  Multithreaded Processors , 2002, Comput. J..

[11]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[12]  Trevor N. Mudge,et al.  The effect of speculative execution on cache performance , 1994, Proceedings of 8th International Parallel Processing Symposium.

[13]  J. Gregory Steffan The Potential for Thread-Level Data Speculat ion in Tight ly-Coupled Mult iprocessors , 1997 .