Improving the Convergence Property of Soft Committee Machines by Replacing Derivative with Truncated Gaussian Function

In online gradient descent learning, the local property of the derivative of the output function can cause slow convergence. This phenomenon, called a plateau, occurs in the learning process of a multilayer network. Improving the derivative term, we propose a simple method replacing the derivative term with a truncated Gaussian function that greatly increases the convergence speed. We then analyze a soft committee machine trained by proposed method, and show how proposed method breaks a plateau. Results showed that the proposed method eventually led to break the symmetry between hidden units.