Multi-Context Term Embeddings: the Use Case of Corpus-based Term Set Expansion

In this paper, we present a novel algorithm that combines multi-context term embeddings using a neural classifier and we test this approach on the use case of corpus-based term set expansion. In addition, we present a novel and unique dataset for intrinsic evaluation of corpus-based term set expansion algorithms. We show that, over this dataset, our algorithm provides up to 5 mean average precision points over the best baseline.

[1]  Antonio Moreno,et al.  Text Analytics: the convergence of Big Data and Artificial Intelligence , 2016, Int. J. Interact. Multim. Artif. Intell..

[2]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[3]  Joel R. Tetreault,et al.  It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool , 2015, ACL.

[4]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[7]  Magnus Sahlgren,et al.  Distributional Term Set Expansion , 2018, LREC.

[8]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[9]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[10]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[11]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[12]  Amir Zeldes,et al.  A Deeper Look into Dependency-Based Word Embeddings , 2018, NAACL-HLT.

[13]  Ido Dagan,et al.  Learning Entailment Relations by Global Graph Structure Optimization , 2012, CL.

[14]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[15]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[16]  Moshe Wasserblat,et al.  Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow , 2018, ArXiv.

[17]  Xiaojie Yuan,et al.  Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches , 2010, COLING.

[18]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[19]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[20]  Ari Rappoport,et al.  Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words , 2006, ACL.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Chong Feng,et al.  Entity Set Expansion from Twitter , 2018, ICTIR.

[25]  Shengli Wu,et al.  Evaluating Score Normalization Methods in Data Fusion , 2006, AIRS.

[26]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[27]  Valentin Jijkoun,et al.  "More like these": growing entity classes from seeds , 2007, CIKM '07.

[28]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[29]  Moshe Wasserblat,et al.  Term Set Expansion based NLP Architect by Intel AI Lab , 2018, EMNLP.