论文信息 - From doc2query to docTTTTTquery

From doc2query to docTTTTTquery

The setup in this work follows doc2query, but with T5 [8] as the expansion model. T5 is a sequence-tosequence model that uses a similar pretraining objective as BERT [3] to pretrain its encoder-decoder architecture. In this model, all target tasks are cast as sequence-to-sequence tasks. In our case, we feed as input the passage and train the model to generate the question. We train the model with a constant learning rate of 10−4 for 4k iterations with batches of 256, which corresponds to 2 epochs with the MS MARCO training set. We use a maximum of 512 input tokens and 64 output tokens. In the MS MARCO dataset, none of the inputs or outputs have to be truncated when using these lengths. Similar to Nogueira et al. [7], we find that the top-k sampling decoder [4] produces more effective queries than beam search. We use k = 10. In all experiments, we use T5-base as we did not notice any improvement in retrieval effectiveness with the large model. We did not experiment with T5-3B and T5-11B due to their computational cost.

D. Cheriton

[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[2] Jimmy J. Lin,et al. Anserini: Reproducible Ranking Baselines Using Lucene , 2018, ACM J. Data Inf. Qual..

[3] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[4] Jamie Callan,et al. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval , 2019, arXiv.org.

[5] Jimmy J. Lin,et al. Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[6] Allan Hanbury,et al. On the Effect of Low-Frequency Terms on Neural-IR Models , 2019, SIGIR.

[7] Jimmy J. Lin,et al. Document Expansion by Query Prediction , 2019, ArXiv.

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..