论文信息 - Cross-Lingual Training with Dense Retrieval for Document Retrieval - 字舞流文

Cross-Lingual Training with Dense Retrieval for Document Retrieval

Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to multiple non-English languages. Our experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families reveal that zero-shot model-based transfer using mBERT improves the search quality in non-English mono-lingual retrieval. Also, we find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer that requires external translators and query generators.

Jimmy J. Lin | Jimmy Lin | Rui Zhang | He Bai | Peng Shi | Rui Zhang | He Bai | Peng Shi | Richard He Bai

[1] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[2] Zhuyun Dai,et al. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval , 2019, ArXiv.

[3] Jimmy J. Lin,et al. Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[4] Fredric C. Gey,et al. The TREC 2002 Arabic/English CLIR Track , 2002, TREC.

[5] Garrett Bingham,et al. Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations , 2019, ACL.

[6] Jimmy J. Lin,et al. Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[7] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.

[8] James Allan,et al. A Study of Neural Matching Models for Cross-lingual IR , 2020, SIGIR.

[10] Wei-Cheng Chang,et al. Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[11] William Hartmann,et al. Cross-lingual Information Retrieval with BERT , 2020, CLSSTS.

[12] Donna K. Harman,et al. Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[13] Jamie Callan,et al. Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[14] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[15] Nazli Goharian,et al. Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning , 2020, ECIR.

[16] Douglas W. Oard,et al. Cross-language Sentence Selection via Data Augmentation and Rationale Training , 2021, ACL.

[17] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[18] Ping Li,et al. Cross-lingual Language Model Pretraining for Retrieval , 2021, WWW.

[19] Carol Peters,et al. CLEF 2006: Ad Hoc Track Overview , 2006, CLEF.

[20] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[21] Noriko Kando,et al. Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access , 2008, NTCIR.

[22] Jimmy J. Lin,et al. Applying BERT to Document Retrieval with Birch , 2019, EMNLP.

[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24] Jimmy J. Lin,et al. Cross-Lingual Relevance Transfer for Document Retrieval , 2019, ArXiv.

[25] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[26] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[27] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2020, AAAI.

[28] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[29] Simone Paolo Ponzetto,et al. Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval , 2021, ECIR.

[30] Sukomal Pal,et al. FIRE-2012 Adhoc Retrieval Task and Morpheme Extraction Task , 2012 .

[31] Jimmy J. Lin,et al. Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.