From doc2query to docTTTTTquery

The setup in this work follows doc2query, but with T5 [8] as the expansion model. T5 is a sequence-tosequence model that uses a similar pretraining objective as BERT [3] to pretrain its encoder-decoder architecture. In this model, all target tasks are cast as sequence-to-sequence tasks. In our case, we feed as input the passage and train the model to generate the question. We train the model with a constant learning rate of 10−4 for 4k iterations with batches of 256, which corresponds to 2 epochs with the MS MARCO training set. We use a maximum of 512 input tokens and 64 output tokens. In the MS MARCO dataset, none of the inputs or outputs have to be truncated when using these lengths. Similar to Nogueira et al. [7], we find that the top-k sampling decoder [4] produces more effective queries than beam search. We use k = 10. In all experiments, we use T5-base as we did not notice any improvement in retrieval effectiveness with the large model. We did not experiment with T5-3B and T5-11B due to their computational cost.