Revisiting Representation Degeneration Problem in Language Modeling

Weight tying is now a common setting in many language generation tasks such as language modeling and machine translation. However, a recent study reveals that there is a potential flaw in weight tying. They find that the learned word embeddings are likely to degenerate and lie in a narrow cone when training a language model. They call it the representation degeneration problem and propose a cosine regularization to solve it. Nevertheless, we prove that the cosine regularization is insufficient to solve the problem, as the degeneration is still likely to happen under certain conditions. In this paper, we revisit the representation degeneration problem and theoretically analyze the limitations of the previously proposed solution. Afterward, we propose an alternative regularization method called Laplacian regularization to tackle the problem. Experiments on language modeling demonstrate the effectiveness of the proposed Laplacian regularization.

[1]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[2]  Gary Bécigneul,et al.  Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities , 2019, ICML.

[3]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[4]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[5]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Jing Huang,et al.  Improving Neural Language Generation with Spectrum Control , 2020, ICLR.

[8]  Yuxiao Hu,et al.  Learning a Spatially Smooth Subspace for Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[10]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11]  John W. Merrill,et al.  Automatic Speech Recognition , 2005 .

[12]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[14]  Haoran Chen,et al.  Structured subspace embedding on attributed networks , 2020, Inf. Sci..

[15]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[16]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[17]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[18]  LiuWei,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010 .

[19]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[20]  Quoc V. Le,et al.  Mixtape: Breaking the Softmax Bottleneck Efficiently , 2019, NeurIPS.

[21]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22]  Masaaki Nagata,et al.  Direct Output Connection for a High-Rank Language Model , 2018, EMNLP.

[23]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Yasuhiro Fujiwara,et al.  Sigsoftmax: Reanalysis of the Softmax Bottleneck , 2018, NeurIPS.

[26]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[29]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.