Learning to Bootstrap for Entity Set Expansion

Bootstrapping for Entity Set Expansion (ESE) aims at iteratively acquiring new instances of a specific target category. Traditional bootstrapping methods often suffer from two problems: 1) delayed feedback, i.e., the pattern evaluation relies on both its direct extraction quality and extraction quality in later iterations. 2) sparse supervision, i.e., only few seed entities are used as the supervision. To address the above two problems, we propose a novel bootstrapping method combining the Monte Carlo Tree Search (MCTS) algorithm with a deep similarity network, which can efficiently estimate delayed feedback for pattern evaluation and adaptively score entities given sparse supervision signals. Experimental results confirm the effectiveness of the proposed method.

[1]  Mário J. Silva,et al.  Semi-Supervised Bootstrapping of Relationship Extractors with Distributional Semantics , 2015, EMNLP.

[2]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[3]  Xianpei Han,et al.  A Probabilistic Co-Bootstrapping Method for Entity Set Expansion , 2014, COLING.

[4]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[5]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[6]  Christopher D. Manning,et al.  Improved Pattern Learning for Bootstrapped Entity Extraction , 2014, CoNLL.

[7]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[8]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Joshua A. Levine,et al.  Visual Supervision in Bootstrapped Information Extraction , 2018, EMNLP.

[10]  Roman Yangarber,et al.  Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[11]  Danushka Bollegala,et al.  Using Graph Based Method to Improve Bootstrapping Relation Extraction , 2011, CICLing.

[12]  Yang Li,et al.  Leveraging Pattern Semantics for Extracting Entities in Enterprises , 2015, WWW.

[13]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[14]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[15]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[18]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[19]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[20]  William W. Cohen,et al.  Bootstrapping Biomedical Ontologies for Scientific Text using NELL , 2012, BioNLP@HLT-NAACL.

[21]  Neal Lewis,et al.  Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length , 2015, AAAI.

[22]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[23]  Tara McIntosh,et al.  Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping , 2010, EMNLP.

[24]  Christopher D. Manning,et al.  Distributed Representations of Words to Guide Bootstrapped Entity Classifiers , 2015, NAACL.

[25]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[26]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[27]  Xianpei Han,et al.  Global Distant Supervision for Relation Extraction , 2016, AAAI.

[28]  Ralph Grishman,et al.  Filtered Ranking for Bootstrapping in Event Extraction , 2010, COLING.

[29]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[30]  Yifan He,et al.  ICE: Rapid Information Extraction Customization for NLP Novices , 2015, HLT-NAACL.

[31]  James R. Curran,et al.  Weighted Mutual Exclusion Bootstrapping for Domain Independent Lexicon and Template Acquisition , 2008, ALTA.

[32]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[33]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.