Bitwise aggregate networks

Typical communication networks for parallel processing are based on sending data from one processor to one, or all, of the other processors. Using such a network, many simple operations that require information from every processor requires many point-to-point or broadcast communications. These aggregate operations can be as simple as a barrier synchronization or as complex as an arithmetic reduction. In this paper we discuss a class of networks that directly implement a wide range of aggregate operations. These networks are capable of performing aggregate operations in a single communication operation using only simple bitwise combining logic in a trivially scalable tree configuration.