Communications-Efficient Multithreading on Wide-Area Networks

This paper shows how to run multithreaded programs on a DRAM (Distributed Random Access Memory) parallel computer and demonstrates that such programs can run efficiently on a collection of machines distributed across thousands of miles over the internet. Suppose we have a fully strict multithreaded program has work and critical-path length , and we have a processor DRAM machine with an upper bound to the cost of routing any permutation. This paper presents a deterministic conservative DRAM scheduling algorithm that runs in time and a randomized conservative DRAM scheduling algorithm that runs in time . We have modified the Cilk multithreaded runtime system to use our randomized conservative DRAM scheduler. Surprisingly the modified system, called TreeCilk, often achieves a performance improvement when one 2000-mile-away machine is added to a tightly-bound cluster of machines.