PREDICT: Persian Reverse Dictionary

Finding the appropriate words to convey concepts (i.e., lexical access) is essential for effective communication. Reverse dictionaries fulfill this need by helping individuals to find the word(s) which could relate to a specific concept or idea. To the best of our knowledge, this resource has not been available for the Persian language. In this paper, we compare four different architectures for implementing a Persian reverse dictionary (PREDICT). We evaluate our models using (phrase,word) tuples extracted from the only Persian dictionaries available online, namely Amid, Moein, and Dehkhoda where the phrase describes the word. Given the phrase, a model suggests the most relevant word(s) in terms of the ability to convey the concept. The model is considered to perform well if the correct word is one of its top suggestions. Our experiments show that a model consisting of Long Short-Term Memory (LSTM) units enhanced by an additive attention mechanism is enough to produce suggestions comparable to (or in some cases better than) the word in the original dictionary. The study also reveals that the model sometimes produces the synonyms of the word as its output which led us to introduce a new metric for the evaluation of reverse dictionaries called Synonym Accuracy accounting for the percentage of times the event of producing the word or a synonym of it occurs. The assessment of the best model using this new metric also indicates that at least 62% of the times, it produces an accurate result within the top 100 suggestions.

[1]  Masoud Rahgozar,et al.  Hamshahri: A standard Persian text collection , 2009, Knowl. Based Syst..

[2]  Kazunori Yamaguchi,et al.  Improvement of Neural Reverse Dictionary by Using Cascade Forward Neural Network , 2020, J. Inf. Process..

[3]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[4]  Long Duong,et al.  Natural language processing for resource-poor languages , 2017 .

[5]  Kemal Oflazer,et al.  Use of wordnet for retrieving words from their meanings , 2004 .

[6]  Pierre Nugues,et al.  A lexical database and an algorithm to find words from definitions , 2002 .

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Dietrich Klakow,et al.  Using Multi-Sense Vector Embeddings for Reverse Dictionaries , 2019, IWCS.

[9]  Mojgan Seraji,et al.  Morphosyntactic Corpora and Tools for Persian , 2015 .

[10]  Nigel Collier,et al.  Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs , 2018, EMNLP.

[11]  Jugal K. Kalita,et al.  Creating Reverse Bilingual Dictionaries , 2013, HLT-NAACL.

[12]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[13]  Lei Zhang,et al.  WantWords: An Open-source Online Reverse Dictionary System , 2020, EMNLP.

[14]  Sushrut Thorat,et al.  Implementing a Reverse Dictionary, based on word definitions, using a Node-Graph Architecture , 2016, COLING.

[15]  Gerardo Sierra,et al.  Designing an electronic reverse dictionary based on two word association norms of English language , 2019 .

[16]  Hiram Calvo,et al.  Integrated concept blending with vector space models , 2016, Comput. Speech Lang..

[17]  Kazunori Yamaguchi,et al.  Improvement of Reverse Dictionary by Tuning Word Vectors and Category Inference , 2018, ICIST.

[18]  Mohammad Taher Pilehvar On the Importance of Distinguishing Word Meaning Representations: A Case Study on Reverse Dictionary Mapping , 2019, NAACL.

[19]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[20]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[21]  Akbar Hesabi,et al.  Semi Automatic Development of FarsNet ; The Persian WordNet , 2009 .

[22]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[23]  Yoshua Bengio,et al.  Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[24]  Qun Liu,et al.  Multi-channel Reverse Dictionary Model , 2019, AAAI.

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  T. Allison,et al.  A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings , 1971 .

[28]  Anindya Datta,et al.  Building a Scalable Database-Driven Reverse Dictionary , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[32]  Nick C Fox,et al.  Word-finding difficulty: a clinical analysis of the progressive aphasias. , 2007, Brain : a journal of neurology.

[33]  T. Tokunaga,et al.  Dictionary search based on the target word description , 2004 .

[34]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[35]  Steven Skiena,et al.  Learning to Represent Bilingual Dictionaries , 2018, CoNLL.

[36]  Michael Zock,et al.  Word Lookup on the Basis of Associations : from an Idea to a Roadmap , 2004 .

[37]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[38]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[39]  Hiram Calvo,et al.  A Reverse Dictionary Based on Semantic Analysis Using WordNet , 2013, MICAI.

[40]  Michael Zock,et al.  Enhancing Electronic Dictionaries with an Index Based on Associations , 2006, ACL.

[41]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.