论文信息 - Word Representations: A Simple and General Method for Semi-Supervised Learning

Word Representations: A Simple and General Method for Semi-Supervised Learning

If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/

Yoshua Bengio | Lev-Arie Ratinov | Joseph P. Turian | Yoshua Bengio | Lev-Arie Ratinov

[1] S. T. Dumais,et al. Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[2] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[3] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[4] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[5] Timo Honkela,et al. Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[6] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[7] Curt Burgess,et al. Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[8] Akira Ushioda,et al. Hierarchical Clustering of Words , 1996, COLING.

[9] T. Honkela. Self-Organizing Maps of Words for Natural Language Processing Applications , 1997 .

[10] Samuel Kaski,et al. Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[11] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[12] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[13] Magnus Sahlgren,et al. Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[14] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Tong Zhang,et al. A Robust Risk Minimization based Named Entity Recognition System , 2003, CoNLL.

[16] Blockin Blockin. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[17] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[18] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19] Jaakko J. Väyrynen,et al. Word Category Maps based on Emergent Features Created by ICA , 2004 .

[20] T. Kohonen,et al. Self-organizing semantic maps , 1989, Biological Cybernetics.

[21] Scott Miller,et al. Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[22] Percy Liang,et al. Semi-Supervised Learning for Natural Language , 2005 .