Learning to Generate Representations for Novel Words: Mimic the OOV Situation in Training
暂无分享,去创建一个
[1] Mikhail Khodak,et al. A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors , 2018, ACL.
[2] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[3] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[4] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[5] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.
[6] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.
[7] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[8] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .
[9] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.
[10] Marcello Federico,et al. Compositional Representation of Morphologically-Rich Input for Neural Machine Translation , 2018, ACL.
[11] Marco Baroni,et al. High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.
[12] Kevin Gimpel,et al. Mapping Unseen Words to Task-Trained Embedding Spaces , 2015, Rep4NLP@ACL.
[13] Philip Bachman,et al. Learning with Pseudo-Ensembles , 2014, NIPS.
[14] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.
[15] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.
[16] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[17] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[18] Tommaso Caselli,et al. When it's all piling up: investigating error propagation in an NLP pipeline , 2015, WNACP@NLDB.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[21] C. Lee Giles,et al. Overfitting and neural networks: conjugate gradient and backpropagation , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[23] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.
[24] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[25] Xuanjing Huang,et al. Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.
[26] Jacob Eisenstein,et al. Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.