Studying Catastrophic Forgetting in Neural Ranking Models

Several deep neural ranking models have been proposed in the recent IR literature. While their transferability to one target domain held by a dataset has been widely addressed using traditional domain adaptation strategies, the question of their cross-domain transferability is still under-studied. We study here in what extent neural ranking models catastrophically forget old knowledge acquired from previously observed domains after acquiring new knowledge, leading to performance decrease on those domains. Our experiments show that the effectiveness of neural IR ranking models is achieved at the cost of catastrophic forgetting and that a lifelong learning strategy using a cross-domain regularizer successfully mitigates the problem. Using an explanatory approach built on a regression model, we also show the effect of domain characteristics on the rise of catastrophic forgetting. We believe that the obtained results can be useful for both theoretical and practical future work in neural IR.

[1]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[2]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[3]  Laurent Itti,et al.  Overcoming catastrophic forgetting problem by weight consolidation and long-term memory , 2018, ArXiv.

[4]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[5]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[6]  Sean MacAvaney,et al.  OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline , 2020, WSDM.

[7]  Jimmy J. Lin,et al.  Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering , 2019, ArXiv.

[8]  Bhaskar Mitra,et al.  An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[9]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Tommaso Di Noia,et al.  How Dataset Characteristics Affect the Robustness of Collaborative Recommendation Models , 2020, SIGIR.

[12]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[13]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[14]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[15]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[16]  Kyunghyun Cho,et al.  Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[17]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jimmy J. Lin,et al.  Overview of the TREC-2013 Microblog Track , 2013, TREC.

[20]  Mariana L. Neves,et al.  Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.

[21]  Md. Mustafizur Rahman,et al.  Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[22]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[23]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[24]  Jimmy J. Lin,et al.  Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models , 2019, SIGIR.

[25]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[26]  David Filliat,et al.  Don't forget, there is more than forgetting: new metrics for Continual Learning , 2018, ArXiv.

[27]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[28]  Mary Williamson,et al.  Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions , 2020, ArXiv.

[29]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[30]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31]  Dawei Yin,et al.  Adaptive Parameterization for Neural Dialogue Generation , 2019, EMNLP/IJCNLP.

[32]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[33]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[34]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[35]  Bhaskar Mitra,et al.  Cross Domain Regularization for Neural Ranking Models using Adversarial Learning , 2018, SIGIR.

[36]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[37]  Bing Liu,et al.  Towards a Continuous Knowledge Learning Engine for Chatbots , 2018, ArXiv.

[38]  Allan Hanbury,et al.  On the Effect of Low-Frequency Terms on Neural-IR Models , 2019, SIGIR.

[39]  Marius Mosbach,et al.  On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.

[40]  Philip S. Yu,et al.  Lifelong Domain Word Embedding via Meta-Learning , 2018, IJCAI.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[43]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[44]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[45]  Gediminas Adomavicius,et al.  Impact of data characteristics on recommender systems performance , 2012, TMIS.

[46]  Ellie Pavlick,et al.  When does data augmentation help generalization in NLP? , 2020, ArXiv.

[47]  Pascal Poupart,et al.  Progressive Memory Banks for Incremental Domain Adaptation , 2018, ICLR.

[48]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[49]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[50]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[51]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[52]  Bing Liu,et al.  Lifelong Machine Learning, Second Edition , 2018, Lifelong Machine Learning.

[53]  Bhaskar Mitra,et al.  An Updated Duet Model for Passage Re-ranking , 2019, ArXiv.