论文信息 - FacetE: exploiting web tables for domain-specific word embedding evaluation - 字舞流文

FacetE: exploiting web tables for domain-specific word embedding evaluation

Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.

Wolfgang Lehner | Maik Thiele | Michael Günther | Paul Sikorski

[1] Omer Levy,et al. Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[2] Thorsten Joachims,et al. Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[3] S. M. García,et al. 2014: , 2020, A Party for Lazarus.

[4] Rubén Prieto-Díaz. Implementing faceted classification for software reuse , 1991, CACM.

[5] Susan C. Herring,et al. A Faceted Classification Scheme for Computer-Mediated Discourse , 2007 .

[6] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[7] Christian Bizer,et al. Synthesizing N-ary Relations from Web Tables , 2019, WIMS2019.

[8] L. Miles,et al. 2000 , 2000, RDH.

[9] Evgeniy Gabrilovich,et al. Large-scale learning of word relatedness with constraints , 2012, KDD.

[10] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.

[11] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[12] Farhad Nooralahzadeh,et al. Evaluation of Domain-specific Word Embeddings using Knowledge Resources , 2018, LREC.

[13] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[14] Ralf Krestel,et al. Domain-specific word embeddings for patent classification , 2019, Data Technol. Appl..

[15] Xiaojun Wan,et al. Representation Learning for Aspect Category Detection in Online Reviews , 2015, AAAI.

[16] Oded Shmueli,et al. Using Word Embedding to Enable Semantic Queries in Relational Databases , 2017, DEEM@SIGMOD.

[17] Felix Hill,et al. SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[18] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[19] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[20] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21] Yoav Goldberg,et al. Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure , 2016, RepEval@ACL.

[22] Michael Günther. FREDDY: Fast Word Embeddings in Database Systems , 2018, SIGMOD Conference.

[23] David J. Weir,et al. A critique of word similarity as a method for evaluating distributional semantic models , 2016, RepEval@ACL.

[24] Tie-Yan Liu,et al. WordRep: A Benchmark for Research on Learning Word Representations , 2014, ArXiv.

[25] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[26] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[27] Wolfgang Lehner,et al. RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data , 2020, EDBT.

[28] Wolfgang Lehner,et al. Building the Dresden Web Table Corpus: A Classification Approach , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[29] Guido Zuccon,et al. Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32] C. Martin. 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[33] Pablo Gamallo. Using the Outlier Detection Task to Evaluate Distributional Semantic Models , 2019, Mach. Learn. Knowl. Extr..

[34] Alessandro Lenci,et al. ESSLLI Workshop on Distributional Lexical Semantics Bridging the gap between semantic theory and computational simulations , 2008 .

[35] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[36] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[37] Amir Bakarov,et al. A Survey of Word Embeddings Evaluation Methods , 2018, ArXiv.

[38] Hongfang Liu,et al. A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[39] William A. Sethares,et al. Domain Adapted Word Embeddings for Improved Sentiment Classification , 2018, ACL.

[40] Noam Slonim,et al. TR9856: A Multi-word Term Relatedness Benchmark , 2015, ACL.

[41] Roberto Navigli,et al. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations , 2016, RepEval@ACL.