FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, given the seed set {"Canon", "Sony", "Nikon"}, previous methods return one mixed set of entities that are either Camera Brands or Japanese Companies. In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet. We propose an unsupervised framework, FUSE, which consists of three major components: (1) facet discovery module: identifies all semantic facets of each seed entity by extracting and clustering its skip-grams, and (2) facet fusion module: discovers shared semantic facets of the entire seed set by an optimization formulation, and (3) entity expansion module: expands each semantic facet by utilizing an iterative algorithm robust to skip-gram noise. Extensive experiments demonstrate that our algorithm, FUSE, can accurately identify multiple semantic facets of the seed set and generate quality entities for each facet.

[1]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[2]  Niraj K. Jha,et al.  Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM , 2019, ArXiv.

[3]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[4]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[5]  Valentin Jijkoun,et al.  "More like these": growing entity classes from seeds , 2007, CIKM '07.

[6]  Xianpei Han,et al.  Knowledge Extraction from Wikis/BBS/Blogs/News Web Sites , 2014, Mining User Generated Content.

[7]  Brian M. Sadler,et al.  HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion , 2018, KDD.

[8]  Gerhard Weikum,et al.  From information to knowledge: harvesting entities and relationships from web sources , 2010, PODS '10.

[9]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[10]  Hwee Tou Ng,et al.  Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains , 2015, NAACL.

[11]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[12]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[13]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[14]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[15]  Malik Magdon-Ismail,et al.  Measuring Similarity between Sets of Overlapping Clusters , 2010, 2010 IEEE Second International Conference on Social Computing.

[16]  Niraj K. Jha,et al.  Grow and Prune Compact, Fast, and Accurate LSTMs , 2018, IEEE Transactions on Computers.

[17]  Zhe Chen,et al.  Long-tail Vocabulary Dictionary Extraction from the Web , 2016, WSDM.

[18]  Nancy Chinchor,et al.  MUC-4 evaluation metrics , 1992, MUC.

[19]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Moshe Wasserblat,et al.  SetExpander: End-to-end Term Set Expansion Based on Multi-Context Term Embeddings , 2018, COLING.

[23]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[26]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[27]  William W. Cohen,et al.  Iterative Set Expansion of Named Entities Using the Web , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[29]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[30]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[31]  Moshe Wasserblat,et al.  Term Set Expansion based NLP Architect by Intel AI Lab , 2018, EMNLP.

[32]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[33]  Marco Pennacchiotti,et al.  Open Entity Extraction from Web Search Query Logs , 2010, COLING.

[34]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[35]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.