Optimal Total Exchange for a 3-D Torus of Processors

During the last decade, the number of commercially available distributed-memory parallel computers has grown considerably. The researchers have shown great interest in using this kind of computers for accelerating the execution time of algorithms. However, generic tools for exploiting parallelism have to be developed for the common users. In this line of thought, some fundamental communication schemes (like broadcast, total exchange or scattering) have been studied on useful networks of processors (namely, hypercube, torus, ring, etc.) [2,4-6,8,9]. This paper deals with the design of an optimal Total Exchange algorithm (often denoted by ATA for All-To-All) on a 3-D torus of processors.