Model Compression Hardens Deep Neural Networks: A New Perspective to Prevent Adversarial Attacks

Deep neural networks (DNNs) have been demonstrating phenomenal success in many real-world applications. However, recent works show that DNN's decision can be easily misguided by adversarial examples-the input with imperceptible perturbations crafted by an ill-disposed adversary, causing the ever-increasing security concerns for DNN-based systems. Unfortunately, current defense techniques face the following issues: 1) they are usually unable to mitigate all types of attacks, given that diversified attacks, which may occur in practical scenarios, have different natures and 2) most of them are subject to considerable implementation cost such as complete retraining. This prompts an urgent need of developing a comprehensive defense framework with low deployment costs. In this work, we reveal that ``defensive decision boundary'' and ``small gradient'' are two critical conditions to ease the effectiveness of adversarial examples with different properties. We propose to wisely use ``hash compression'' to reconstruct a low-cost ``defensive hash classifier'' to form the first line of our defense. We then propose a set of retraining-free ``gradient inhibition'' (GI) methods to extremely suppress and randomize the gradient used to craft adversarial examples. Finally, we develop a comprehensive defense framework by orchestrating ``defensive hash classifier'' and ``GI.'' We evaluate our defense across traditional white-box, strong adaptive white-box, and black-box settings. Extensive studies show that our solution can enormously decrease the attack success rate of various adversarial attacks on the diverse dataset.