论文信息 - Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling - 字舞流文

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.

Gábor Berend | Gábor Berend

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Zhi Jin,et al. Compressing Neural Language Models by Sparse Word Representations , 2016, ACL.

[3] Roberto Basili,et al. Building the Italian Syntactic-Semantic Treebank , 2003 .

[4] Sebastian Riedel,et al. The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[5] Brendan T. O'Connor,et al. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[6] Steven Skiena,et al. Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[7] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[8] Ronan Collobert,et al. Rehabilitation of Count-Based Models for Word Vector Representations , 2015, CICLing.

[9] Karl Stratos,et al. Simple Semi-Supervised POS Tagging , 2015, VS@HLT-NAACL.

[10] Mariona Taulé,et al. AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[11] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[12] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13] Percy Liang,et al. Semi-Supervised Learning for Natural Language , 2005 .

[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15] Wang Banyue,et al. Chapter 5 , 2003 .

[16] Noah A. Smith,et al. Learning Word Representations with Hierarchical Sparse Coding , 2014, ICML.

[17] Joakim Nivre,et al. Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[18] Robert Malouf,et al. Algorithms for Linguistic Processing, NWO PIONIER, Progress Report , 2002 .

[19] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[20] Tom M. Mitchell,et al. Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[21] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[22] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[24] Ronan Collobert,et al. Word Embeddings through Hellinger PCA , 2013, EACL.

[25] Yulia Tsvetkov,et al. Sparse Overcomplete Word Vector Representations , 2015, ACL.

[26] Barbara Plank,et al. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[27] Xueqi Cheng,et al. Sparse Word Embeddings Using ℓ1 Regularized Online Learning , 2016, IJCAI.

[28] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[29] Sabine Buchholz,et al. CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[30] Saso Dzeroski,et al. Towards a Slovene Dependency Treebank , 2006, LREC.

[31] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[32] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.

[33] Veronika Laippala,et al. Universal Dependencies 1.4 , 2015 .

[34] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[35] Timothy Baldwin,et al. Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks , 2015, CoNLL.

[36] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[37] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[38] Eckhard Bick,et al. Floresta Sintá(c)tica: A treebank for Portuguese , 2002, LREC.

[39] Chris Dyer,et al. Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[40] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[41] Kemal Oflazer,et al. The Annotation Process in the Turkish Treebank , 2003, LINC@EACL.

[42] János Csirik,et al. The Szeged Treebank , 2005, TSD.

[43] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[44] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[45] Joakim Nivre,et al. Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[46] Sabine Brants,et al. The TIGER Treebank , 2001 .

[47] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[48] Leon Derczynski,et al. Tune Your Brown Clustering, Please , 2015, RANLP.