Why Squashing Functions in Multi-Layer Neural Networks

Most multi-layer neural networks used in deep learning utilize rectified linear neurons. In our previous papers, we showed that if we want to use the exact same activation function for all the neurons, then the rectified linear function is indeed a reasonable choice. However, preliminary analysis shows that for some applications, it is more advantageous to use different activation functions for different neurons – i.e., select a family of activation functions instead, and select the parameters of activation functions of different neurons during training. Specifically, this was shown for a special family of squashing functions that contain rectified linear neurons as a particular case. In this paper, we explain the empirical success of squashing functions by showing that the formulas describing this family follow from natural symmetry requirements.