Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

[1]  Fredric C. Gey,et al.  The TREC 2002 Arabic/English CLIR Track , 2002, TREC.

[2]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[3]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[4]  Yiqun Liu,et al.  Sogou-QCL: A New Dataset with Click Relevance Label , 2018, SIGIR.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[7]  Goran Glavas,et al.  Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only , 2018, SIGIR.

[8]  Kevin Duh,et al.  Cross-Lingual Learning-to-Rank with Shared Representations , 2018, NAACL.

[9]  Young-Bum Kim,et al.  Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources , 2017, EMNLP.

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[11]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[12]  Carol Peters,et al.  Cross-Language Information Retrieval (CLIR) Track Overview , 1997, TREC.

[13]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[16]  Ellen M. Voorhees,et al.  The TREC 2005 robust track , 2006, SIGF.

[17]  Md. Mustafizur Rahman,et al.  Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[18]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[19]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[20]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[21]  W. Bruce Croft,et al.  A Comparison of Retrieval Models using Term Dependencies , 2014, CIKM.

[22]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[23]  Donna Harman,et al.  The fourth text REtrieval conference , 1996 .

[24]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[25]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[26]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[27]  Ellen M. Voorhees,et al.  The Sixth Text REtrieval Conference (TREC-6) , 2000, Inf. Process. Manag..

[28]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[29]  Donna K. Harman,et al.  Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[30]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[31]  Ellen M. Voorhees,et al.  TREC 2014 Web Track Overview , 2015, TREC.

[32]  Bhaskar Mitra,et al.  An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[33]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[34]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[35]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[36]  Ellen M. Voorhees,et al.  Information Technology: The Sixth Text Retrieval Conference (TREC-6) | NIST , 1998 .

[37]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[38]  Jimmy J. Lin,et al.  Cross-Lingual Relevance Transfer for Document Retrieval , 2019, ArXiv.

[39]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[40]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[41]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.