Speculative Execution on GPU: An Exploratory Study

We explore the possibility of using GPUs for speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU.

[1]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[2]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[3]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[4]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[5]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[6]  Antonio González,et al.  Value prediction for speculative multithreaded architectures , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Jean-Luc Gaudiot,et al.  Potential Impact of Value Prediction on Communication in Many-Core Architectures , 2009, IEEE Transactions on Computers.

[8]  PaduaDavid,et al.  The LRPD test , 1995 .

[9]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[12]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.