Reducing Communication for Distributed Learning in Neural Networks

A learning algorithm is presented for circuits consisting of a single layer of perceptrons. We refer to such circuits as parallel perceptrons. In spite of their simplicity,these circuits are universal approximators for arbitrary boolean and continuous functions. In contrast to backprop for multi-layer perceptrons,our new learning algorithm - the parallel delta rule (p-delta rule) - only has to tune a single layer of weights,and it does not require the computation and communication of analog values with high precision. Reduced communication also distinguishes our new learning rule from other learning rules for such circuits such as those traditionally used for MADALINE. A theoretical analysis shows that the p-delta rule does in fact implement gradient descent - with regard to a suitable error measure - although it does not require to compute derivatives. Furthermore it is shown through experiments on common real-world benchmark datasets that its performance is competitive with that of other learning approaches from neural networks and machine learning. Thus our algorithm also provides an interesting new hypothesis for the organization of learning in biological neural systems.