论文信息 - Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

The use of deep pre-trained transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on four tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.

[1] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[2] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[3] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[5] Anil K. Jain,et al. On-line signature verification, , 2002, Pattern Recognit..

[6] V. Matousek,et al. Signature verification using ART-2 neural network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[7] Jason Weston,et al. Supervised Semantic Indexing , 2009, ECIR.

[8] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Rudolf Kadlec,et al. Improved Deep Learning Baselines for Ubuntu Corpus Dialogs , 2015, ArXiv.

[12] Zhoujun Li,et al. Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[13] Zhen-Hua Ling,et al. Building Sequential Inference Models for End-to-End Response Selection , 2018, ArXiv.

[14] Jun Huang,et al. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[15] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[16] Hai Zhao,et al. Modeling Multi-turn Conversation with Deep Utterance Aggregation , 2018, COLING.

[17] Jason Weston,et al. StarSpace: Embed All The Things! , 2017, AAAI.

[18] Jianxiong Dong,et al. Enhance word representation for out-of-vocabulary on Ubuntu dialogue corpus , 2018, ArXiv.