Yao-Hung Hubert Tsai | Martin Q. Ma | Muqiao Yang | Han Zhao | Louis-Philippe Morency | Ruslan Salakhutdinov
[1] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[2] Aaron C. Courville,et al. MINE: Mutual Information Neural Estimation , 2018, ArXiv.
[3] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.
[4] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[5] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[6] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[7] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.
[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[9] Elise van der Pol,et al. Contrastive Learning of Structured World Models , 2020, ICLR.
[10] Lei Yu,et al. A Mutual Information Maximization Perspective of Language Representation Learning , 2019, ICLR.
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.
[13] Sindy Löwe,et al. Putting An End to End-to-End: Gradient-Isolated Learning of Representations , 2019, NeurIPS.
[14] Karl Stratos,et al. Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.
[15] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[16] Frank Nielsen,et al. A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.
[17] Frank Nielsen,et al. On the chi square and higher-order chi distances for approximating f-divergences , 2013, IEEE Signal Processing Letters.
[18] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[19] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Stefano Ermon,et al. Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.
[21] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[22] Akshay Krishnamurthy,et al. Contrastive learning, multi-view redundancy, and linear models , 2020, ALT.
[23] Yue Wu,et al. Demystifying Self-Supervised Learning: An Information-Theoretical Framework , 2020, ArXiv.
[24] Ching-Yao Chuang,et al. Debiased Contrastive Learning , 2020, NeurIPS.
[25] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[26] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[27] Chen Wang,et al. Supervised Contrastive Learning , 2020, NeurIPS.
[28] Jian Yang,et al. Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Chen Sun,et al. What makes for good views for contrastive learning , 2020, NeurIPS.
[30] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[31] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.
[32] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[33] Giulio Colavolpe,et al. Elements of Information Theory , 2013 .
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[35] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Alex Graves,et al. Long Short-Term Memory , 2020, Computer Vision.
[39] Lillian Lee,et al. Measures of Distributional Similarity , 1999, ACL.
[40] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[41] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[42] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[43] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[44] Sergey Levine,et al. Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.
[45] Alexander A. Alemi,et al. On Variational Bounds of Mutual Information , 2019, ICML.
[46] RussakovskyOlga,et al. ImageNet Large Scale Visual Recognition Challenge , 2015 .
[47] Makoto Yamada,et al. Neural Methods for Point-wise Dependency Estimation , 2020, NeurIPS.
[48] Michael Tschannen,et al. On Mutual Information Maximization for Representation Learning , 2019, ICLR.
[49] Takafumi Kanamori,et al. Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.
[50] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[51] Lillian Lee,et al. On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.
[52] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).