SAGA: Self-Augmentation with Guided Attention for Representation Learning

Self-supervised training that elegantly couples contrastive learning with a wide spectrum of data augmentation techniques has been shown to be a successful paradigm for representation learning. However, current methods implicitly maximize the agreement between differently augmented views of the same sample, which may perform poorly in certain situations. For example, considering an image comprising a boat on the sea, one augmented view is cropped solely from the boat and the other from the sea, whereas linking these two to form a positive pair could be misleading. To resolve this issue, we introduce a Self-Augmentation with Guided Attention (SAGA) strategy, which augments input data based on predictive attention to learn representations rather than simply applying off-the-shelf augmentation schemes. As a result, the proposed self-augmentation framework enables feature learning to enhance the robustness of representation.