论文信息 - BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages - 字舞流文

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages bet- ter than alternative subword approaches, while requiring vastly fewer resources and no tokenization. BPEmb is available at this https URL

Benjamin Heinzerling | Michael Strube | M. Strube | Benjamin Heinzerling

[1] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2] Iryna Gurevych,et al. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[3] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[4] Christopher D. Manning,et al. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[5] Kentaro Inui,et al. Neural Architectures for Fine-grained Entity Type Classification , 2016, EACL.

[6] Hinrich Schütze,et al. Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities , 2017, EACL.

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[9] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[10] Hinrich Schütze,et al. Nonsymbolic Text Representation , 2016, EACL.

[11] Philipp Koehn,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[12] Nevena Lazic,et al. Context-Dependent Fine-Grained Entity Type Tagging , 2014, ArXiv.

[13] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[14] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[15] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.

[16] Daniel S. Weld,et al. Fine-Grained Entity Recognition , 2012, AAAI.

[17] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18] Christopher D. Manning,et al. Improving Coreference Resolution by Learning Entity-Level Distributed Representations , 2016, ACL.