Distribution-preserving-based Automatic Data Augmentation for Deep Image Steganalysis

In recent years, deep learning-based steganalyzers far outperformed handcrafted feature-based steganalyzers. However, a large amount of data is needed to train deep learning networks. For steganalysis tasks, the steganographic traces are subtle and the steganographic signals are difficult to be captured when the number of cover/stego pairs in the training set is insufficient. Data augmentation has been proved to be effective in improving accuracy and generalization for deep learning models. Yet not all data augmentation methods are universal for all tasks. When performing data augmentation, we argue that data distribution under the target tasks should be maintained. Since the steganalysis task is more concerned with the high-frequency signals of the images, if the high-frequency signals are unchanged, the data distribution from the perspective of steganalysis will remain largely unchanged. Based on this principle, we designed a neural network called cover augmentation network, which enriches the dataset by intelligently adding noise to the original cover to generate the augmented cover. Further, we designed a whole process of data augmentation based on the cover augmentation network. Experimental results show that the proposed data augmentation method can effectively improve the performance of steganalysis networks, and the advantage is significant at low payloads.