Convergent Superiority of Bipolar Nets from the Viewpoint of Input Set Separability

In this paper, the separability condition that the separation hyperplanes pass through rectangles surrounding the input sets is formulated by normal vectors. Then distributions of the normal vectors satisfying the condition are depicted for the two-dimensional case. These distributions elucidate that the condition for the first hidden layer varies significantly even when the input patterns are simply translated, and that the conditions (in a wider sense) for the other layers are different depending on whether the unit is unipolar or bipolar, i.e. whether it activates from 0 to 1 or from-0.5 to 0.5. The initial distributions of the normal vectors with the weights initialized ordinarily by random numbers with zero mean are also depicted. Comparison of the initial distributions to the separability conditions leads to the conclusion that the bipolar nets are superior to the unipolar nets in convergence of the back propagation learning initialized in such an ordinary manner. The bipolar nets exhibit better convergence than the unipolar nets even if their input space divisions in every layer by the separation hyperplanes are geometrically the same at the outset of the learning.