Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification

Labeling objects at the subordinate level typically requires expert knowledge, which is not always available from a random annotator. Accordingly, learning directly from web images for fine-grained visual classification (FGVC) has attracted broad attention. However, the existence of noise in web images is a huge obstacle for training robust deep neural networks. In this paper, we propose a novel approach to remove irrelevant samples from the real-world web images during training, and only utilize useful images for updating the networks. Thus, our network can alleviate the harmful effects caused by irrelevant noisy web images to achieve better performance. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to state-of-the-art webly supervised methods. The data and source code of this work have been made anonymously available at: https://github.com/z337-408/WSNFGVC.