Semi-Supervised Pairing via Basis-Sharing Wasserstein Matching Auto-Encoder

Semi-supervised pairing, learning explicit pairing relation between unlabeled input and target distribution in an otherwise semisupervised setting, has recently attracted lots of attention [16, 12], since data labeling is often time-consuming and requires lots of human labor. In this paper, we propose Basis-Sharing Wasserstein Matching Auto-Encoder to tackle the problem of semi-supervised pairing. Our model inspired by the success of robust representation learning for matching cross-modal latent space distribution [15, 10] and good statistical properties of optimal transport [1, 4]. In particular, Wasserstein distance in the optimal transport family has been shown to work on GANs and autoencoders [2, 14]. We propose to match the latent code distribution of the unlabeled dataset, with the labeled data points as “anchor points”. This should help the network making association between the distributions from two different domains. A similar idea that uses a different method has been shown to work on some language tasks [3]; our work explores a new approach based on distribution matching. Through preliminary experiments, we show that the proposed algorithm can successfully incorporate the unlabeled data for improving the classification accuracy on MNIST and CIFAR10 datasets.

[1]  Yang Wang,et al.  Video Summarization by Learning From Unpaired Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Makoto Yamada,et al.  Learning Unsupervised Word Translations Without Adversaries , 2018, EMNLP.

[4]  Ben Glocker,et al.  Multi-modal Learning from Unpaired Images: Application to Multi-organ Segmentation in CT and MRI , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[6]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[7]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[8]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[9]  Shun-ichi Amari,et al.  Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem , 2017, Information Geometry.

[10]  Ruslan Salakhutdinov,et al.  Learning Robust Visual-Semantic Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[12]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[13]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[14]  C. Villani Optimal Transport: Old and New , 2008 .

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .