Improving Convolutional Neural Network Using Pseudo Derivative ReLU

Rectified linear unit (ReLU) is a widely used activation function in artificial neural networks, it is considered to be an efficient active function benefit from its simplicity and nonlinearity. However, ReLU’s derivative for negative inputs is zero, which can make some ReLUs inactive for essentially all inputs during the training. There are several ReLU variations for solving this problem. Comparing with ReLU, they are slightly different in form, and bring other drawbacks like more expensive in computation. In this study, pseudo derivatives were tried replacing original derivative of ReLU while ReLU itself was unchanged. The pseudo derivative was designed to alleviate the zero derivative problem and be consistent with original derivative in general. Experiments showed using pseudo derivative ReLU (PD-ReLU) could obviously improve AlexNet (a typical convolutional neural network model) in CIFAR-10 and CIFAR-100 tests. Furthermore, some empirical criteria for designing such pseudo derivatives were proposed.