Improving Word Embedding Factorization for Compression using Distilled Nonlinear Neural Decomposition
暂无分享,去创建一个
[1] Rahul Goel,et al. Online Embedding Compression for Text Classification using Low Rank Matrix Factorization , 2018, AAAI.
[2] Joos Vandewalle,et al. A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[5] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[6] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[7] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Tommi S. Jaakkola,et al. Weighted Low-Rank Approximations , 2003, ICML.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Jamin Shin,et al. On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression , 2019, ArXiv.
[11] Yuhong Guo,et al. Time-aware Large Kernel Convolutions , 2020, ICML.
[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[13] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.
[14] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[16] Shuang Wu,et al. Slim Embedding Layers for Recurrent Neural Language Models , 2017, AAAI.
[17] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[18] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Ivan Oseledets,et al. Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..
[21] Kai Yu,et al. Structured Word Embedding for Low Memory Neural Network Language Model , 2018, INTERSPEECH.
[22] Yang Li,et al. GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking , 2018, NeurIPS.
[23] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.
[24] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.
[25] Hideki Nakayama,et al. Compressing Word Embeddings via Deep Compositional Code Learning , 2017, ICLR.
[26] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[27] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[28] Valentin Khrulkov,et al. Tensorized Embedding Layers for Efficient Model Compression , 2019, ArXiv.
[29] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[30] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[32] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[34] Rich Caruana,et al. Model compression , 2006, KDD '06.