EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion

A key challenge of entity set expansion is that multifaceted input seeds can lead to significant incoherence in the result set. In this paper, we present a novel solution to handling multifaceted seeds by combining existing user-generated ontologies with a novel word-similarity metric based on skip-grams. By blending the two resources we are able to produce sparse word ego-networks that are centered on the seed terms and are able to capture semantic equivalence among words. We demonstrate that the resulting networks possess internally-coherent clusters, which can be exploited to provide non-overlapping expansions, in order to reflect different semantic classes of the seeds. Empirical evaluation against state-of-the-art baselines shows that our solution, EgoSet, is able to not only capture multiple facets in the input query, but also generate expansions for each facet with higher precision.

[1]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[2]  William W. Cohen,et al.  WebSets: extracting sets of entities from the web using unsupervised information extraction , 2012, WSDM '12.

[3]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[4]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[5]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[6]  William W. Cohen,et al.  Iterative Set Expansion of Named Entities Using the Web , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Katherine A. Heller,et al.  Bayesian Sets , 2005, NIPS.

[8]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[9]  Xianpei Han,et al.  Knowledge Extraction from Wikis/BBS/Blogs/News Web Sites , 2014, Mining User Generated Content.

[10]  Youngjoong Ko,et al.  Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification , 2011, ACL.

[11]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[12]  Marius Pasca,et al.  Open-Domain Fine-Grained Class Extraction from Web Search Queries , 2013, EMNLP.

[13]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[14]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[15]  Xiaojie Yuan,et al.  Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches , 2010, COLING.

[16]  Jennifer Chu-Carroll,et al.  Question Answering Using Constraint Satisfaction: QA-By-Dossier-With-Contraints , 2004, ACL.

[17]  Benjamin Van Durme,et al.  Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction , 2008, AAAI.

[18]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[19]  Sergey Ioffe,et al.  Improved Consistent Sampling, Weighted Minhash and L1 Sketching , 2010, 2010 IEEE International Conference on Data Mining.

[20]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[21]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[22]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[23]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[24]  Yee Whye Teh,et al.  Improving Word Sense Disambiguation Using Topic Features , 2007, EMNLP.

[25]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[27]  Zhe Chen,et al.  Long-tail Vocabulary Dictionary Extraction from the Web , 2016, WSDM.

[28]  Xianpei Han,et al.  Knowledge Extraction from Wikis/BBS/Blogs/News Web Sites , 2014, Mining User Generated Content.

[29]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[30]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[31]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Shuming Shi,et al.  Employing Topic Models for Pattern-based Semantic Class Discovery , 2009, ACL/IJCNLP.

[34]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[36]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[37]  William W. Cohen,et al.  From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering , 2013, ECML/PKDD.

[38]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.