Curriculum Learning Strategies for IR

Neural ranking models are traditionally trained on a series of random batches, sampled uniformly from the entire training set. Curriculum learning has recently been shown to improve neural models’ effectiveness by sampling batches non-uniformly, going from easy to difficult instances during training. In the context of neural Information Retrieval (IR) curriculum learning has not been explored yet, and so it remains unclear (1) how to measure the difficulty of training instances and (2) how to transition from easy to difficult instances during training. To address both challenges and determine whether curriculum learning is beneficial for neural ranking models, we need large-scale datasets and a retrieval task that allows us to conduct a wide range of experiments. For this purpose, we resort to the task of conversation response ranking: ranking responses given the conversation history. In order to deal with challenge (1), we explore scoring functions to measure the difficulty of conversations based on different input spaces. To address challenge (2) we evaluate different pacing functions, which determine the velocity in which we go from easy to difficult instances. We find that, overall, by just intelligently sorting the training data (i.e., by performing curriculum learning) we can improve the retrieval effectiveness by up to 2% (The source code is available at https://github.com/Guzpenha/transformers_cl.).

[1]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[2]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3]  Bowen Zhou,et al.  End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension , 2016, 1610.09996.

[4]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[5]  Ondrej Bojar,et al.  Curriculum Learning and Minibatch Bucketing in Neural Machine Translation , 2017, RANLP.

[6]  Jungi Kim,et al.  Boosting Neural Machine Translation , 2016, IJCNLP.

[7]  John H. L. Hansen,et al.  Curriculum Learning Based Approaches for Noise Robust Speaker Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Claudia Hauff,et al.  Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset , 2019, ArXiv.

[9]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[10]  Jun Huang,et al.  Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[11]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[12]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[13]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[14]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[15]  Sadao Kurohashi,et al.  FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance , 2019, SIGIR.

[16]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[17]  Louis-Philippe Morency,et al.  Curriculum Learning for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[18]  Ying Chen,et al.  Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network , 2018, ACL.

[19]  Raffaele Perego,et al.  Continuation Methods and Curriculum Learning for Learning to Rank , 2018, CIKM.

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[24]  W. Bruce Croft,et al.  BERT with History Answer Embedding for Conversational Question Answering , 2019, SIGIR.

[25]  Hai Zhao,et al.  Modeling Multi-turn Conversation with Deep Utterance Aggregation , 2018, COLING.

[26]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[27]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[28]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[30]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[31]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Dongyan Zhao,et al.  One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues , 2019, ACL.

[33]  Jimmy J. Lin,et al.  Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models , 2019, SIGIR.

[34]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[35]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[36]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[37]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Douglas L. T. Rohde,et al.  Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[39]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[40]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[42]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[43]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[44]  Jun Zhao,et al.  Curriculum Learning for Natural Answer Generation , 2018, IJCAI.

[45]  W. Bruce Croft,et al.  User Intent Prediction in Information-seeking Conversations , 2019, CHIIR.

[46]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[47]  Marius Leordeanu,et al.  Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN) , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Jimmy J. Lin,et al.  Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.

[49]  Huda Khayrallah,et al.  An Empirical Exploration of Curriculum Learning for Neural Machine Translation , 2018, ArXiv.

[50]  Jian-Yun Nie,et al.  Empirical Study of Multi-level Convolution Models for IR Based on Representations and Interactions , 2018, ICTIR.

[51]  W. Bruce Croft,et al.  Analyzing and Characterizing User Intent in Information-seeking Conversations , 2018, SIGIR.

[52]  Daphna Weinshall,et al.  On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[53]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[56]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[57]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[58]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[59]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[60]  Eric P. Xing,et al.  Easy Questions First? A Case Study on Curriculum Learning for Question Answering , 2016, ACL.

[61]  Xiaodong Liu,et al.  A Hybrid Retrieval-Generation Neural Conversation Model , 2019, CIKM.

[62]  Jimmy J. Lin,et al.  Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling , 2019, EMNLP.