A Study of Neural Matching Models for Cross-lingual IR

In this study, we investigate interaction-based neural matching models for ad-hoc cross-lingual information retrieval (CLIR) using cross-lingual word embeddings (CLWEs). With experiments conducted on the CLEF collection over four language pairs, we evaluate and provide insight into different neural model architectures, different ways to represent query-document interactions and word-pair similarity distributions in CLIR. This study paves the way for learning an end-to-end CLIR system using CLWEs.

[1]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[2]  Jimmy J. Lin,et al.  Flat vs. hierarchical phrase-based translation models for cross-language information retrieval , 2013, SIGIR.

[3]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[4]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[5]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[6]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[7]  Xueqi Cheng,et al.  A Study of MatchPyramid Models on Ad-hoc Retrieval , 2016, ArXiv.

[8]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[9]  Jimmy J. Lin,et al.  Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval , 2014, TOIS.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[12]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[13]  Jimmy J. Lin,et al.  Looking inside the box: context-sensitive translation for cross-language information retrieval , 2012, SIGIR '12.

[14]  Goran Glavas,et al.  Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only , 2018, SIGIR.

[15]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[16]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[17]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.