How does BERT capture semantics? A closer look at polysemous words
暂无分享,去创建一个
Florian Schmidt | Yannic Kilcher | David Yenicelik | Yannic Kilcher | Florian Schmidt | David Yenicelik
[1] Nando de Freitas,et al. Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.
[2] Florian Schmidt,et al. BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward , 2020, ArXiv.
[3] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .
[4] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[5] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[6] Mikael Kågebäck,et al. Word Sense Disambiguation using a Bidirectional LSTM , 2016, CogALex@COLING.
[7] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .
[8] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.
[9] Christian Biemann,et al. Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.
[10] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[11] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[12] Ricardo Ribeiro,et al. L2F/INESC-ID at SemEval-2019 Task 2: Unsupervised Lexical Semantic Frame Induction using Contextualized Word Representations , 2019, SemEval@NAACL-HLT.
[13] M. Cugmas,et al. On comparing partitions , 2015 .
[14] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[15] Delbert Dueck,et al. Clustering by Passing Messages Between Data Points , 2007, Science.
[16] José Camacho-Collados,et al. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.
[17] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[18] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .
[19] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[20] Martin Wattenberg,et al. Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.
[21] Shai Ben-David,et al. Clusterability: A Theoretical Study , 2009, AISTATS.
[22] Christian Biemann,et al. Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities , 2018, LREC.
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[24] P. Hagoort. Interplay between Syntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations , 2003, Journal of Cognitive Neuroscience.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Kawin Ethayarajh,et al. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.
[27] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.
[28] Jimmy J. Lin,et al. Simple BERT Models for Relation Extraction and Semantic Role Labeling , 2019, ArXiv.
[29] Bin Wang,et al. Evaluating word embedding models: methods and experimental results , 2019, APSIPA Transactions on Signal and Information Processing.
[30] Gregor Wiedemann,et al. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.
[31] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[32] Ricardo J. G. B. Campello,et al. Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.
[33] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[34] Christian Biemann,et al. Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .
[35] Katrin Erk,et al. Word Sense Clustering and Clusterability , 2016, CL.
[36] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .
[37] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..