Automatic Synonym Discovery with Knowledge Bases

Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc. and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.

[1]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[2]  Heng Ji,et al.  Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding , 2016, KDD.

[3]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[4]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[5]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[6]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[7]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[8]  Thomas C. Rindflesch,et al.  Synonym, Topic Model and Predicate-Based Query Expansion for Retrieving Clinical Documents , 2012, AMIA.

[9]  Fang Kong,et al.  Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy , 2009, EMNLP.

[10]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[11]  Ralph Grishman,et al.  Semi-supervised Semantic Pattern Discovery with Guidance from Unsupervised Pattern Clusters , 2010, COLING.

[12]  Stuart Adam Battersby,et al.  Experimenting with Distant Supervision for Emotion Classification , 2012, EACL.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Tie-Yan Liu,et al.  Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding , 2016, EMNLP.

[15]  Marie-Francine Moens,et al.  The use of topic segmentation for automatic summarization , 2002, ACL 2002.

[16]  Gemma Boleda,et al.  Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[17]  Fuchun Peng,et al.  Context sensitive synonym discovery for web search queries , 2009, CIKM.

[18]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[19]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[20]  Marija Stanojevic COGNITIVE SYNONYMY: A GENERAL OVERVIEW , 2009 .

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[23]  Xiang Ren,et al.  Synonym Discovery for Structured Entities on Heterogeneous Graphs , 2015, WWW.

[24]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25]  Mohamed Yahya,et al.  ReNoun: Fact Extraction for Nominal Attributes , 2014, EMNLP.

[26]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[27]  Jason Weston,et al.  Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction , 2013, EMNLP.

[28]  Tie-Yan Liu,et al.  Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered Word Embedding , 2015 .

[29]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[30]  Surajit Chaudhuri,et al.  Exploiting web search to generate synonyms for entities , 2009, WWW '09.

[31]  Clare R. Voss,et al.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering , 2015, KDD.

[32]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[33]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[34]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[35]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[36]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[37]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.