Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval

November 7–11, 2021. ©2021 Association for Computational Linguistics 935 Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval Nattapol Trijakwanich*, Peerat Limkonchotiwat*, Raheem Sarwar‡, Wannaphong Phatthiyaphaibun*, Ekapol Chuangsuwanich†, Sarana Nutanong* *School of Information Science and Technology, VISTEC, Thailand ‡RGCL, University of Wolverhampton, United Kingdom †Department of Computer Engineering, Chulalongkorn University, Thailand {nattapol.t_s17,peerat.l_s19}@vistec.ac.th {wannaphong.p_s21,snutanon}@vistec.ac.th R.Sarwar4@wlv.ac.uk, ekapolc@cp.eng.chula.ac.th Abstract

[1]  José A. R. Fonollosa,et al.  Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders , 2020, EACL.

[2]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[3]  Eneko Agirre,et al.  Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining , 2020, ACL.

[4]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[5]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[7]  Ray Kurzweil,et al.  Multilingual Universal Sentence Encoder for Semantic Retrieval , 2019, ACL.

[8]  Manfred Stede,et al.  Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection , 2020, LREC.

[9]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[10]  Ahmed Abdelali,et al.  The AMARA Corpus: Building Parallel Language Resources for the Educational Domain , 2014, LREC.

[11]  Pierre Zweigenbaum,et al.  A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora , 2018, LREC.

[12]  Roberto Navigli,et al.  Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information , 2020, ACL.

[13]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[14]  Iryna Gurevych,et al.  Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[15]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[16]  Holger Schwenk,et al.  Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.