Empower Entity Set Expansion via Language Model Probing

Entity set expansion, aiming at expanding a small seed entity set with new entities belonging to the same semantic class, is a critical task that benefits many downstream NLP and IR applications, such as question answering, query understanding, and taxonomy construction. Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. In this study, we propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue. In each iteration, we select one positive and several negative class names by probing a pre-trained language model, and further score each candidate entity based on selected class names. Experiments on two datasets show that our framework generates high-quality class names and outperforms previous state-of-the-art methods significantly.

[1]  William W. Cohen,et al.  Iterative Set Expansion of Named Entities Using the Web , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[4]  William W. Cohen,et al.  Automatic Set Expansion for List Question Answering , 2008, EMNLP.

[5]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[7]  Steven Schockaert,et al.  Inducing Relational Knowledge from BERT , 2019, AAAI.

[8]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[9]  Xianpei Han,et al.  Learning to Bootstrap for Entity Set Expansion , 2019, EMNLP.

[10]  Brian M. Sadler,et al.  HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion , 2018, KDD.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Moshe Wasserblat,et al.  Term Set Expansion based NLP Architect by Intel AI Lab , 2018, EMNLP.

[15]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[16]  James Allan,et al.  Corpus-based Set Expansion with Lexical Features and Distributed Representations , 2019, SIGIR.

[17]  Jiawei Han,et al.  Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach , 2018, SIGIR.

[18]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[19]  Dan Roth,et al.  Learning from Negative Examples in Set-Expansion , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Haixun Wang,et al.  Understand Short Texts by Harvesting and Analyzing Semantic Knowledge , 2017, IEEE Transactions on Knowledge and Data Engineering.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[23]  Yu Meng,et al.  Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion , 2020, WWW.

[24]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[25]  Chao Zhang,et al.  FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams , 2019, ECML/PKDD.

[26]  James R. Curran,et al.  Weighted Mutual Exclusion Bootstrapping for Domain Independent Lexicon and Template Acquisition , 2008, ALTA.

[27]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[28]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[29]  Juan-Zi Li,et al.  Course Concept Expansion in MOOCs with External Knowledge and Interactive Game , 2019, ACL.

[30]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.