Neural Entity Synonym Set Generation using Association Information and Entity Constraint

Automatically generating entity synonym sets (i.e., sets of terms that represent the same entity) is an important work for many entity-based tasks. Existing studies on entity synonym set generation either use a ranking plus pruning approach or take the problem as a two-phase task (i.e., extracting synonymy pairs, subsequently organizing these pairs into synonym sets). However, these approaches ignore the association semantics of entities and suffer from the error propagation issue. In this paper, we propose a neural-network-based entity synonym set generation approach that exploits association information and entity constraint to generate synonym sets from a given term (i.e., entity) vocabulary. Firstly, to learn whether a new term should be added into the synonym set, an association-aware set-term neural network classifier is proposed. In the classifier, not only the entity representations but also the entity association information is exploited for extracting synonymous features. Secondly, an entity-constraint-based synonym set generation algorithm is employed to apply the trained set-term neural network classifier to generate the entity synonym sets from the term vocabulary. Finally, we conduct the proposed approach on three real-world datasets. The experimental results demonstrate that the entity synonym set generation performance of the proposed approach is better than that of the compared approaches.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[4]  Jiawei Han,et al.  Automatic Synonym Discovery with Knowledge Bases , 2017, KDD.

[5]  Wei Liu,et al.  Automatically refining synonym extraction results: Cleaning and ranking , 2018, J. Inf. Sci..

[6]  Zhiyong Lu,et al.  Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts , 2012, Database J. Biol. Databases Curation.

[7]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[8]  Olivier Ferret,et al.  Using pseudo-senses for improving the extraction of synonyms from word embeddings , 2018, ACL.

[9]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[10]  Tong Wang,et al.  Exploring patterns in dictionary definitions for synonym extraction , 2011, Natural Language Engineering.

[11]  Yi Liu,et al.  Sentence Vector Model Based on Implicit Word Vector Expression , 2018, IEEE Access.

[12]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[13]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[14]  Ming Gao,et al.  EnAli: entity alignment across multiple heterogeneous data sources , 2018, Frontiers of Computer Science.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Hongbo Deng,et al.  Ranking Relevance in Yahoo Search , 2016, KDD.

[17]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.

[18]  See-Kiong Ng,et al.  Incorporating Trustiness and Collective Synonym/Contrastive Evidence into Taxonomy Construction , 2015, EMNLP.

[19]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[20]  Hui Zhang,et al.  Verbal Explanations for Deep Reinforcement Learning Neural Networks with Attention on Extracted Features , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[21]  Xiangfeng Luo,et al.  Topic detection model in a single‐domain corpus inspired by the human memory cognitive process , 2018, Concurr. Comput. Pract. Exp..

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  Shulong Tan,et al.  Hierarchical Multi-Task Word Embedding Learning for Synonym Prediction , 2019, KDD.

[24]  Brian M. Sadler,et al.  Mining Entity Synonyms with Efficient Neural Set Generation , 2018, AAAI.

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  Vijayan Sugumaran,et al.  A Capability Assessment Model for Emergency Management Organizations , 2018, Inf. Syst. Frontiers.

[27]  Yike Guo,et al.  An unsupervised approach for learning a Chinese IS-A taxonomy from an unstructured corpus , 2019, Knowl. Based Syst..

[28]  Fangfang Liu,et al.  Discovery of associated topics for the intelligent browsing , 2008, 2008 First IEEE International Conference on Ubi-Media Computing.

[29]  Xiangfeng Luo,et al.  Measuring the veracity of web event via uncertainty , 2015, J. Syst. Softw..

[30]  Christian Biemann,et al.  Watset: Automatic Induction of Synsets from a Graph of Synonyms , 2017, ACL.

[31]  Yeye He,et al.  Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora , 2016, WWW.