论文信息 - A Comparison of Representation Models in a Non-Conventional Semantic Similarity Scenario

A Comparison of Representation Models in a Non-Conventional Semantic Similarity Scenario

Representation models have shown very promising results in solving semantic similarity problems. Normally, their performances are benchmarked on well-tailored experimental settings, but what happens with unusual data? In this paper, we present a comparison between popular representation models tested in a nonconventional scenario: assessing action reference similarity between sentences from different domains. The action reference problem is not a trivial task, given that verbs are generally ambiguous and complex to treat in NLP. We set four variants of the same tests to check if different pre-processing may improve models performances. We also compared our results with those obtained in a common benchmark dataset for a similar task.1

Eneko Agirre | Oier Lopez de Lacalle | Andrea Amelio Ravelli

[1] Simone Paolo Ponzetto,et al. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[2] Christopher Joseph Pal,et al. Movie Description , 2016, International Journal of Computer Vision.

[3] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.

[6] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Massimo Moneglia,et al. The IMAGACT Visual Ontology. An Extendable Multilingual Infrastructure for the representation of lexical encoding of Action , 2014, LREC.

[8] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[11] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.