Robustness of Deep Convolutional Neural Networks for Image Recognition

Recent research has found deep neural networks to be vulnerable, by means of prediction error, to images corrupted by small amounts of non-random noise. These images, known as adversarial examples are created by exploiting the input to output mapping of the network. For the MNIST database, we observe in this paper how well the known regularization/robustness methods improve generalization performance of deep neural networks when classifying adversarial examples and examples perturbed with random noise. We conduct a comparison of these methods with our proposed robustness method, an ensemble of models trained on adversarial examples, able to clearly reduce prediction error. Apart from robustness experiments, human classification accuracy for adversarial examples and examples perturbed with random noise is measured. Obtained human classification accuracy is compared to the accuracy of deep neural networks measured in the same experimental settings. The results indicate, human performance does not suffer from neural network adversarial noise.