WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language

WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNet database for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard in Shipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.

[1]  Charnyote Pluempitiwiriyawej,et al.  Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries , 2005, MTSUMMIT.

[2]  Verginica Barbu Mititelu Adding Morpho-semantic Relations to the Romanian Wordnet , 2012, LREC.

[3]  M. Forcada Open-source machine translation : an opportunity for minor languages , 2006 .

[4]  James Loriot,et al.  Diccionario Shipibo-Castellano , 1993 .

[5]  Egoitz Laparra,et al.  Multilingual Central Repository version 3.0 , 2012, LREC.

[6]  Hitoshi Isahara,et al.  Enhancing the Japanese WordNet , 2009, ALR7@IJCNLP.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Vincent Berment Several Directions for Minority Languages Computerization , 2002, COLING.

[9]  Kepa Sarasola,et al.  Strategies for developing machine translation for minority languages 5th SALTMIL Workshop on Minority Languages , 2006 .

[10]  Horacio Rodríguez,et al.  Using WordNet for Building WordNets , 1998, WordNet@ACL/COLING.

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Karina Natalia Sullón Acosta,et al.  Documento nacional de lenguas originarias del Perú , 2013 .

[13]  Hesham Faili,et al.  Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD , 2016, J. Artif. Intell. Res..

[14]  Gregory R. Crane,et al.  The Making of Ancient Greek WordNet , 2014, LREC.

[15]  Stan Szpakowicz,et al.  Corpus-based Semantic Relatedness for the Construction of Polish WordNet , 2008, LREC.