Are Crescia and Piadina the Same? Towards Identifying Synonymy or Non-Synonymy between Italian Words to Enable Crowdsourcing from Language Learners

We introduce a method to generate candidate pairs of related Italian words sharing (or not) synonymous relations from the ConceptNet knowledgebase. The pairs are intended to generate questions for a vocabulary trainer which combines exercises to enhance vocabulary skills with the implicit crowdsourcing of linguistic knowledge about the semantic relations between words. Our method relies on the idea that pairs of synonyms in a language tend to translate to pairs of synonyms in other languages. We generated 85k candidate pairs of Italian synonyms that can be used to produce questions for both teaching (3.8k pairs) and crowdsourcing purposes (80k pairs). Follow-up efforts are however needed in order to generate a complementary set of questions.

[1]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[2]  Piek T. J. M. Vossen,et al.  Introduction to EuroWordNet , 1998, Comput. Humanit..

[3]  Walt Detmar Meurers,et al.  Employing distributional semantics to organize task-focused vocabulary learning , 2020, BEA.

[4]  Jennifer Hill,et al.  Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams , 2016, BEA@NAACL-HLT.

[5]  Lars Nygaard,et al.  NorNet — a monolingual wordnet of modern Norwegian , 2009 .

[6]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[7]  Gábor Bella,et al.  Using Crowd Agreement for Wordnet Localization , 2018, LREC.

[8]  Federico Sangati,et al.  Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet , 2020, LREC.

[9]  Lars Trap-Jensen,et al.  DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary , 2009, Lang. Resour. Evaluation.

[10]  Christos T. Rodosthenous,et al.  An Experiment on Implicitly Crowdsourcing Expert Knowledge about Romanian Synonyms from Language Learners , 2021, NLP4CALL.

[11]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[12]  Heshaam Faili,et al.  Developing the Persian Wordnet of Verbs Using Supervised Learning , 2021, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[13]  Heshaam Faili,et al.  Persian Wordnet Construction using Supervised Learning , 2017, ArXiv.

[14]  Matthias Hagen,et al.  Crowdsourcing a Large Corpus of Clickbait on Twitter , 2018, COLING.

[15]  Alexander Koenig,et al.  Designing a Prototype Architecture for Crowdsourcing Language Resources , 2019, LDK.

[16]  Christian M. Meyer,et al.  Manipulating the Difficulty of C-Tests , 2019, ACL.

[17]  Estonian as a Second Language Teacher’s Tools , 2021, BEA.

[18]  Yaakov HaCohen-Kerner,et al.  Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning , 2020, LREC.

[19]  Maciej Piasecki,et al.  Words, Concepts and Relations in the Construction of Polish WordNet , 2008 .

[20]  P.J.T.M. Vossen,et al.  Right or wrong: combing lexical resources in the EuroWordNet project , 1996 .

[21]  Federico Sangati,et al.  v-trel: Vocabulary Trainer for Tracing Word Relations - An Implicit Crowdsourcing Approach , 2019, RANLP.

[22]  Javad Nouri,et al.  Revita: a Language-learning Platform at the Intersection of ITS and CALL , 2018, LREC.

[23]  Federico Sangati,et al.  Substituto – A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing , 2020, NLP4CALL.