Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data

The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and conditional generation with extra unlabeled data \emph{simultaneously}, both of which aim to make full use of agnostic unlabeled data to improve classification and generation performances. In particular, we present a novel training framework to jointly target both PU classification and conditional generation when exposing to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Conditional Generative Adversarial Network~(CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Our key contribution is a Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that can learn the clean data distribution from noisy labels predicted by a PU classifier. Theoretically, we proved the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets, verifying the simultaneous improvements on both classification and generation.

[1]  Sewoong Oh,et al.  Robust conditional GANs under missing or uncertain labels , 2019, ArXiv.

[2]  Shaogang Gong,et al.  Semi-Supervised Learning under Class Distribution Mismatch , 2020, AAAI.

[3]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[5]  Lawrence Carin,et al.  On Leveraging Pretrained GANs for Generation with Limited Data , 2020, ICML.

[6]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zhanxing Zhu,et al.  Tangent-Normal Adversarial Regularization for Semi-Supervised Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[9]  Chao Xu,et al.  Learning from Bad Data via Generation , 2019, NeurIPS.

[10]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[11]  Zhenan Sun,et al.  A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications , 2020, IEEE Transactions on Knowledge and Data Engineering.

[12]  Lawrence Carin,et al.  On Leveraging Pretrained GANs for Limited-Data Generation , 2020, ICML 2020.

[13]  Takuhiro Kaneko,et al.  Label-Noise Robust Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Dacheng Tao,et al.  Multi-Positive and Unlabeled Learning , 2017, IJCAI.

[15]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[16]  Takeharu Eda,et al.  Effective Data Augmentation with Multi-Domain Learning GANs , 2019, AAAI.

[17]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[18]  Luc Van Gool,et al.  Semi-Supervised Learning by Augmented Distribution Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[20]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[21]  Ashish Khetan,et al.  Robustness of Conditional GANs to Noisy Labels , 2018, NeurIPS.

[22]  Dapeng Chen,et al.  Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification , 2020, ICLR.

[23]  Chongxuan Li,et al.  Countering Noisy Labels By Learning From Auxiliary Clean Labels , 2019 .

[24]  Xiaohua Zhai,et al.  High-Fidelity Image Generation With Fewer Labels , 2019, ICML.

[25]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Takuhiro Kaneko,et al.  Noise Robust Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Masahiro Kato,et al.  Learning from Positive and Unlabeled Data with a Selection Bias , 2018, ICLR.

[29]  Brahim Chaib-draa,et al.  Generative Adversarial Positive-Unlabelled Learning , 2017, IJCAI.

[30]  Jesse Davis,et al.  Learning from positive and unlabeled data: a survey , 2018, Machine Learning.

[31]  Zhanxing Zhu,et al.  Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy , 2019, ArXiv.

[32]  Tatsuya Harada,et al.  Image Generation From Small Datasets via Batch Statistics Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Stefanos Zafeiriou,et al.  Robust Conditional Generative Adversarial Networks , 2018, ICLR.