论文信息 - On the Utility of Word Embeddings for Enriching OpenWordNet-PT

On the Utility of Word Embeddings for Enriching OpenWordNet-PT

The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated. 2012 ACM Subject Classification Computing methodologies → Lexical semantics; Computing methodologies → Language resources

Hugo Gonçalo Oliveira | Alexandre Rademaker | Fredson Silva de Souza Aguiar

[1] Gerard de Melo,et al. OpenWordNet-PT: An Open Brazilian Wordnet for Reasoning , 2012, COLING.

[2] Simone Paolo Ponzetto,et al. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[3] Ana Alves,et al. TALES: Test Set of Portuguese Lexical-Semantic Relations for AssessingWord Embeddings , 2020, HI4NLP@ECAI.

[4] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6] Erik Velldal,et al. Wordnet extension via word embeddings: Experiments on the Norwegian Wordnet , 2017, NODALIDA.

[7] M. Gernsbacher. Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. , 1984, Journal of experimental psychology. General.

[8] Tiago Sousa,et al. Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings , 2020, SLATE.

[9] Francis Bond,et al. A Survey of WordNets and their Licenses , 2011 .

[10] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[11] Nuno Seco,et al. PAPEL: A Dictionary-Based Lexical Ontology for Portuguese , 2008, PROPOR.