Index bit permutations for automatic data redistribution

PISTON is a machine-independent software framework for developing scientific applications on parallel computers. It presents a consistent data-parallel distributed memory model across a wide range of architectures. It has been implemented on MIMD, SIMD and SMP architectures. In this paper, we describe PISTON's implementation of index bit permutations (IBP) as a means of performing automatic regular data redistributions. A theoretical analysis of IBPs is derived and the predicted performance is compared with the actual performance of an IBP implementation on the Fujitsu API000. A detailed examination of the performance of IBPs on two common data redistributions is compared to the performance of hand-coded implementations of the same data redistributions in order to determine the effectiveness of IBPs. Based on this analysis, we generalize to describe what architectural features of a MIMD machine impact on the performance of IBPs and show that they are an efficient means of implementing regular data redistributions on MIMD parallel architectures.