Multiprogramming BSP Programs

We explore the problem of transforming a BSP program for execution on a multipro-gramming architecture, where it has to share resources with other BSP programs executing at the same time. 1 The BSP model The Bulk Synchronous Parallelism (BSP) 2] model is a general-purpose model that is both architecture-independent and eecient for most problems on today's architectures. A BSP program consists of a set of supersteps, each of which consists of: a set of threads, involving local computation on locally-held variables; a global communication in which data is moved between threads; and a barrier synchronisation, which ends the superstep, and deenes the moment at which moved data becomes locally visible. BSP does not exploit locality, so programmers may not make any assumptions about how threads will be mapped to processors. In practice, BSP implementations randomise this placement so that the set of messages to be delivered at any moment during the communication phase will have destinations that approximate a permutation of processor ids. This enables the delivery time for the communication phase to be bounded in terms of the maximum fan-in or fan-out of the communication over all processors, and a single architectural parameter g which is the available per-processor bandwidth under continuous uniformly-destined traac.