Extracting Synonyms from Bilingual Dictionaries

We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries. Identification and usage of synonyms play a significant role in improving the performance of information access applications. The idea is to construct a translation graph from translation pairs, then to extract and consolidate cyclic paths to form bilingual sets of synonyms. The initial evaluation of this algorithm illustrates promising results in extracting Arabic-English bilingual synonyms. In the evaluation, we first converted the synsets in the Arabic WordNet into translation pairs (i.e., losing word-sense memberships). Next, we applied our algorithm to rebuild these synsets. We compared the original and extracted synsets obtaining an F-Measure of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.

[1]  Mustafa Jarrar,et al.  Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping , 2016, J. Artif. Intell. Res..

[2]  Nizar Habash,et al.  Curras: an annotated corpus for the Palestinian Arabic dialect , 2017, Lang. Resour. Evaluation.

[3]  Mustafa Jarrar,et al.  Position paper: towards the notion of gloss, and the adoption of linguistic resources in formal ontology engineering , 2006, WWW '06.

[4]  Mustafa Jarrar,et al.  The Arabic ontology - an Arabic wordnet with ontologically clean content , 2021, Appl. Ontology.

[5]  Gonenc Ercan,et al.  Synset expansion on translation graph for automatic wordnet construction , 2019, Inf. Process. Manag..

[6]  Martin L. King,et al.  Towards a Methodology for Building Ontologies , 1995 .

[7]  Guy Emerson,et al.  What are the Goals of Distributional Semantics? , 2020, ACL.

[8]  Ming Zhou,et al.  Optimizing Synonym Extraction Using Monolingual and Bilingual Resources , 2003, IWP@ACL.

[9]  Mustafa Jarrar,et al.  Representing Arabic Lexicons in Lemon - a Preliminary Study , 2019, LDK.

[10]  Mustafa Jarrar,et al.  Diacritic-Based Matching of Arabic Words , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[11]  Núria Bel,et al.  Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries , 2016, LREC.

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Mustafa Jarrar,et al.  An Arabic-Multilingual Database with a Lexicographic Search Engine , 2019, NLDB.

[14]  C. Fellbaum,et al.  Arabic WordNet and the Challenges of Arabic , 2006, BCS.

[15]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[16]  Jugal Kalita,et al.  Enhancing Automatic Wordnet Construction Using Word Embeddings , 2016 .

[17]  Mustafa Jarrar,et al.  Towards Methodological Principles for Ontology Engineering. , 2005 .

[18]  Christiane Fellbaum,et al.  Automated WordNet Construction Using Word Embeddings , 2017 .

[19]  Tiziano Flati,et al.  The CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance a Bilingual Dictionary: Extended abstract , 2012, IJCAI.

[20]  Mustafa Jarrar,et al.  Usability Evaluation of Lexicographic e-Services , 2019, 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA).

[21]  Robert Meersman,et al.  Scalability and knowledge reusability in ontology modeling , 2002 .

[22]  Hugo Gonçalo Oliveira,et al.  ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically , 2014, Lang. Resour. Evaluation.

[23]  Jugal K. Kalita,et al.  Automatically constructing Wordnet Synsets , 2014, ACL.

[24]  John P. McCrae,et al.  TIAD 2019 shared task: Leveraging knowledge graphs with neural machine translation for automatic multilingual dictionary generation , 2019, TIAD@LDK.

[25]  Christiane Fellbaum,et al.  Towards Building Lexical Ontology via Cross-Language Matching , 2014, GWC.