Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention
暂无分享,去创建一个
[1] Bernhard Schölkopf,et al. Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.
[2] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[3] F. Xavier Roca,et al. Regularizing CNNs with Locally Constrained Decorrelations , 2016, ICLR.
[4] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[5] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.
[6] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[7] Furu Wei,et al. Scheduled DropHead: A Regularization Method for Transformer Models , 2020, FINDINGS.
[8] Surya Ganguli,et al. Universality and individuality in neural dynamics across large populations of recurrent networks , 2019, NeurIPS.
[9] Mehryar Mohri,et al. Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..
[10] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.
[11] Joakim Nivre,et al. An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.
[12] Ankur Bapna,et al. Investigating Multilingual NMT Representations at Scale , 2019, EMNLP.