Run-time techniques for dynamic multithreaded computations

Programming models based on dynamic multithreading enable convenient expression of irregular parallel applications by providing high-level features such as object-oriented programming, a global object name space, and elective concurrency. Despite these programmability advantages, such models have not gained widespread use because they are challenging to implement efficiently on scalable parallel machines due to their dynamic, irregular thread structure (e.g., granularity variations) and unpredictable data access patterns. In this dissertation, we present an execution framework consisting of novel run-time mechanisms that overcome these challenges. The framework separates concerns of local and parallel efficiency, enabling mechanisms to be independently developed and optimized for each. Local efficiency techniques, hybrid stack-heap execution and pull messaging, support low-overhead thread management and communication that deliver performance conditioned upon run-time locality and load balance. Parallel efficiency techniques, view caching and a hierarchical load balancing framework, exploit knowledge of application behavior to create good locality and load balance situations at run time, achieving performance robust over computation irregularity. The execution framework has been implemented in the Illinois Concert System, targeting the ICC++ language. Evaluation of mechanisms on two target platforms, the Cray T3D and the SGI Origin 2000, for four large irregular applications demonstrates their individual effectiveness and collective sufficiency: Each application achieves performance comparable to the best achievable using low-level means. Additionally, these improvements persist across a range of hardware communication architecture interfaces. Our results imply that high-level expressions of parallel programs need not sacrifice on performance.