论文信息 - Distributed Representations of Words to Guide Bootstrapped Entity Classifiers

Distributed Representations of Words to Guide Bootstrapped Entity Classifiers

Bootstrapped classifiers iteratively generalize from a few seed examples or prototypes to other examples of target labels. However, sparseness of language and limited supervision make the task difficult. We address this problem by using distributed vector representations of words to aid the generalization. We use the word vectors to expand entity sets used for training classifiers in a bootstrapped pattern-based entity extraction system. Our experiments show that the classifiers trained with the expanded sets perform better on entity extraction from four online forums, with 30% F1 improvement on one forum. The results suggest that distributed representations can provide good directions for generalization in a bootstrapping system.

Christopher D. Manning | Sonal Gupta | S. Gupta

[1] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[2] Dan Klein,et al. Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[3] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[4] Slav Petrov,et al. Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[5] Ellen Riloff,et al. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[6] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8] Anoop Sarkar,et al. Bootstrapping via Graph Propagation , 2012, ACL.

[9] Steven P. Abney. Understanding the Yarowsky Algorithm , 2004, CL.

[10] Estevam R. Hruschka,et al. Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[11] Christopher D. Manning,et al. Improved Pattern Learning for Bootstrapped Entity Extraction , 2014, CoNLL.