论文信息 - A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models - 字舞流文

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels---possibly with subsequent fine-tuning using a modest number of annotated queries---can produce a competitive or better model compared to transfer learning. Yet, it is necessary to improve the stability and/or effectiveness of the few-shot training, which, sometimes, can degrade performance of a pretrained model.

Leonid Boytsov | Pavel Braslavski | Iurii Mokrii | Pavel Braslavski | Leonid Boytsov | Iurii Mokrii

[1] Mónica Marrero,et al. On the measurement of test collection reliability , 2013, SIGIR.

[2] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[3] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.

[4] W. Bruce Croft,et al. Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[5] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[6] Jimmy J. Lin,et al. Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[7] Craig MacDonald,et al. Transferring Learning To Rank Models for Web Search , 2015, ICTIR.

[8] Eddy Maddalena,et al. Crowd Worker Strategies in Relevance Judgment Tasks , 2020, WSDM.

[9] Leonid Boytsov,et al. Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits , 2021, ECIR.

[10] Mihai Surdeanu,et al. Learning to Rank Answers to Non-Factoid Questions from Web Collections , 2011, CL.

[11] Jimmy J. Lin,et al. Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[12] Jimmy J. Lin,et al. Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.

[13] Iryna Gurevych,et al. MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale , 2020, EMNLP.

[14] Jimmy Lin,et al. A Little Bit Is Worse Than None: Ranking with Limited Training Data , 2020, SUSTAINLP.

[15] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[18] Jimmy J. Lin,et al. The Neural Hype and Comparisons Against Weak Baselines , 2019, SIGIR Forum.

[19] Iryna Gurevych,et al. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[20] Eric Nyberg,et al. Flexible retrieval with NMSLIB and FlexNeuART , 2020, NLPOSS.

[21] Jimmy J. Lin,et al. Cross-Lingual Relevance Transfer for Document Retrieval , 2019, ArXiv.

[22] Nazli Goharian,et al. Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning , 2020, ECIR.

[23] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[25] W. Bruce Croft,et al. A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[26] Bhaskar Mitra,et al. An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[27] Bhaskar Mitra,et al. Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[28] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.

[29] Leonid Boytsov,et al. Deciding on an adjustment for multiplicity in IR experiments , 2013, SIGIR.

[30] Allan Hanbury,et al. Cross-domain Retrieval in the Legal and Patent Domains: a Reproducability Study , 2020, ArXiv.

[31] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[32] Noah Constant,et al. MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models , 2020, ADAPTNLP.

[33] Ellen M. Voorhees,et al. Bias and the limits of pooling for large collections , 2007, Information Retrieval.

[34] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[35] Allan Hanbury,et al. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking , 2020, ECAI.