Data-Triggered Threads

This thesis introduces the data-triggered threads (DTT) programming and execution model. Unlike threads in conventional parallel programming models, the DTT model initiates threads on changes to memory locations. This enables increased parallelism and the elimination of redundant, unnecessary computation. This thesis shows that 78% of all loads fetch redundant data, leading to a high incidence of redundant computation. By expressing computation through the DTT model, that computation is executed once when the data changes, and is skipped whenever the data does not change. The set of C SPEC benchmarks show performance speedup of up to 5.9X, and averaging 46% with architectural support. To improve the generality of the DTT model, this thesis also demonstrates a software-only runtime system that allows DTT programs running on top of existing machines. With mechanisms to minimize the multithreading overhead and dynamically turning on/off the DTT model, the software runtime system improves the performance of serial C SPEC benchmarks by 15% on a Nehalem processor, but by over 7X over the full suite of single-thread applications. We also show that the DTT model can work in conjunction with traditional parallelism using the software-only framework. The DTT model provides up to 64X speedup over parallel applications exploiting traditional parallelism. This thesis also discusses CDTT, a compiler framework that takes C/C++ code and automatically generates a binary that applies the DTT model to eliminate dynamically redundant code without programmer intervention. With the help of idempotence analysis and inter-procedural name dependence analysis, CDTT identifies potential code regions and composes support thread functions that as soon as live-in data changes. CDTT can also use profile data to target the elimination of redundant computation. The compiled binary running on top of a software runtime system can achieve nearly the same level of performance as careful hand-coded modifications in most benchmarks. CDTT improves the performance of serial C SPEC benchmarks by as much as 57% (average 11%) on a Nehalem processor