A Mutual Information Maximization Perspective of Language Representation Learning
暂无分享,去创建一个
[1] Michael Tschannen,et al. On Mutual Information Maximization for Representation Learning , 2019, ICLR.
[2] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[5] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[6] Sindy Löwe,et al. Greedy InfoMax for Biologically Plausible Self-Supervised Representation Learning , 2019, NeurIPS 2019.
[7] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[8] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.
[9] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[10] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[11] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[12] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[15] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[16] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[17] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[18] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[19] Honglak Lee,et al. An efficient framework for learning sentence representations , 2018, ICLR.
[20] Aaron C. Courville,et al. MINE: Mutual Information Neural Estimation , 2018, ArXiv.
[21] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[22] Aäron van den Oord,et al. On variational lower bounds of mutual information , 2018 .
[23] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[24] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[25] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[26] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[29] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.
[30] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[31] Aapo Hyvärinen,et al. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..
[32] Liam Paninski,et al. Estimation of Entropy and Mutual Information , 2003, Neural Computation.
[33] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.
[34] S. Varadhan,et al. Asymptotic evaluation of certain Markov process expectations for large time , 1975 .