Discrepant collaborative training by Sinkhorn divergences

Abstract Deep Co-Training algorithms are typically comprised of two distinct and diverse feature extractors that simultaneously attempt to learn task-specific features from the same inputs. Achieving such an objective is, however, not trivial, despite its innocent look. This is because homogeneous networks tend to mimic each other under the collaborative training setup. Keeping this difficulty in mind, we make use of the newly proposed S∈ divergence to encourage diversity between homogeneous networks. The S∈ divergence encapsulates popular measures such as maximum mean discrepancy and the Wasserstein distance under the same umbrella and provides us with a principled, yet simple and straightforward mechanism. Our empirical results in two domains, classification in the presence of noisy labels and semi-supervised image classification, clearly demonstrate the benefits of the proposed framework in learning distinct and diverse features. We show that in these respective settings, we achieve impressive results by a notable margin.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yan Han,et al.  Learning from Noisy Labels via Discrepant Collaborative Training , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Lin Li,et al.  Co-training an Improved Recurrent Neural Network with Probability Statistic Models for Named Entity Recognition , 2017, DASFAA.

[6]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[7]  Weilong Yang,et al.  Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2019, ICML.

[8]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.

[9]  Andrew Zisserman,et al.  BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[12]  Bo An,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[15]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[16]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[17]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[18]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[19]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[21]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[23]  Bo Wang,et al.  Deep Co-Training for Semi-Supervised Image Recognition , 2018, ECCV.

[24]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[26]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[28]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[29]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[30]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[31]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[32]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.