暂无分享,去创建一个
[1] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[2] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[3] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[4] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[5] J. Wishart. Statistical tables , 2018, Global Education Monitoring Report.
[6] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[7] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Hideki Nakayama,et al. Compressing Word Embeddings via Deep Compositional Code Learning , 2017, ICLR.
[10] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[11] Yang Li,et al. GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking , 2018, NeurIPS.
[12] Hannaneh Hajishirzi,et al. Pyramidal Recurrent Unit for Language Modeling , 2018, EMNLP.
[13] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[14] Rahul Goel,et al. Online Embedding Compression for Text Classification using Low Rank Matrix Factorization , 2018, AAAI.
[15] Shuang Wu,et al. Slim Embedding Layers for Recurrent Neural Language Models , 2017, AAAI.
[16] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[18] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[19] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[20] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[21] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[22] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.
[23] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[24] Yu Zhang,et al. Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.
[25] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[26] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[29] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[30] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[31] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[32] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[35] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[36] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.
[37] R. Srikant,et al. Why Deep Neural Networks for Function Approximation? , 2016, ICLR.
[38] Boris Ginsburg,et al. Factorization tricks for LSTM networks , 2017, ICLR.
[39] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[40] Zhi Jin,et al. Compressing Neural Language Models by Sparse Word Representations , 2016, ACL.
[41] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[42] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[43] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[44] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.
[45] R. A. Fisher,et al. Statistical Tables for Biological, Agricultural and Medical Research , 1956 .
[46] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.