Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
暂无分享,去创建一个
[1] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[2] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[3] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[4] Marc'Aurelio Ranzato,et al. Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.
[5] Yang Feng,et al. Knowledge Diffusion for Neural Dialogue Generation , 2018, ACL.
[6] Cristian Danescu-Niculescu-Mizil,et al. Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.
[7] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[8] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[9] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Xiaoyan Zhu,et al. Commonsense Knowledge Aware Conversation Generation with Graph Attention , 2018, IJCAI.
[12] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[13] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[14] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..
[15] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[16] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.
[19] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[20] Quoc V. Le,et al. Do Language Models Have Common Sense , 2018 .
[21] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[22] Xiaoyu Shen,et al. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.
[23] Christopher Joseph Pal,et al. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[26] Brendan McCane,et al. Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks , 2018, ArXiv.
[27] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..
[28] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[29] James R. Glass,et al. Detecting egregious responses in neural sequence-to-sequence models , 2018, ICLR.
[30] James R. Glass,et al. Negative Training for Neural Dialogue Response Generation , 2019, ACL.
[31] Dilek Z. Hakkani-Tür,et al. Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.
[32] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[33] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[34] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..
[35] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[36] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[37] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.