What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling, achieving impressive results on numerous downstream tasks. It has also been shown that they implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much off-the-shelf pre-trained BERT “knows” about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT’s parameters, we use different probes (i.e., tasks to examine a trained model regarding certain properties) that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. We resort to BERT’s Masked Language Modelling (MLM) head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT’s Next Sentence Prediction (NSP) head and representations’ similarity (SIM) to compare relevant and non-relevant search and recommendation query-document inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. Finally, we study how BERT performs in a conversational recommendation downstream task. To this end, we fine-tune BERT to act as a retrieval-based CRS. Overall, our experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data.

[1]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[2]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[3]  Chunyuan Yuan,et al.  Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots , 2019, EMNLP.

[4]  Ying Chen,et al.  Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network , 2018, ACL.

[5]  Hamed Zamani,et al.  Learning a Joint Search and Recommendation Model from User-Item Interactions , 2020, WSDM.

[6]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[7]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[8]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[11]  Claudia Hauff,et al.  Curriculum Learning Strategies for IR , 2020, ECIR.

[12]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[13]  Graham Neubig,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[14]  Peter Szolovits,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[15]  M. de Rijke,et al.  QRFA: A Data-Driven Model of Information-Seeking Dialogues , 2018, ECIR.

[16]  Mengting Wan,et al.  Item recommendation on monotonic behavior chains , 2018, RecSys.

[17]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[18]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[19]  Mohan S. Kankanhalli,et al.  Attentive Long Short-Term Preference Modeling for Personalized Product Search , 2018, ACM Trans. Inf. Syst..

[20]  Fabio Petroni,et al.  How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[21]  Martin Halvey,et al.  Conceptualizing agent-human interactions during the conversational search process , 2018 .

[22]  Filip Radlinski,et al.  Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences , 2019, SIGdial.

[23]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[26]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[27]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Sadao Kurohashi,et al.  FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance , 2019, SIGIR.

[29]  Jun Huang,et al.  Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[30]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[31]  Christopher Joseph Pal,et al.  Towards Deep Conversational Recommendations , 2018, NeurIPS.

[32]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[33]  Omer Levy,et al.  Are Sixteen Heads Really Better than One? , 2019, NeurIPS.

[34]  Filip Radlinski,et al.  A Theoretical Framework for Conversational Search , 2017, CHIIR.

[35]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[36]  Jonathan Berant,et al.  oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[37]  Claudia Hauff,et al.  Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset , 2019, ArXiv.

[38]  Dietmar Jannach,et al.  A Survey on Conversational Recommender Systems , 2020, ACM Comput. Surv..

[39]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[40]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[41]  W. Bruce Croft,et al.  BERT with History Answer Embedding for Conversational Question Answering , 2019, SIGIR.

[42]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[43]  Philip S. Yu,et al.  Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.

[44]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[45]  Colin Raffel,et al.  How Much Knowledge Can You Pack Into the Parameters of a Language Model? , 2020, EMNLP.

[46]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[47]  ZaragozaHugo,et al.  The Probabilistic Relevance Framework , 2009 .

[48]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[49]  Claudia Hauff,et al.  Diagnosing BERT with Retrieval Heuristics , 2020, ECIR.

[50]  Ruize Wang,et al.  K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, ArXiv.

[51]  Mark Sanderson,et al.  Informing the Design of Spoken Conversational Search: Perspective Paper , 2018, CHIIR.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[55]  Alejandro Bellogín,et al.  Precision-oriented evaluation of recommender systems: an algorithmic comparison , 2011, RecSys '11.

[56]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[57]  Xu Chen,et al.  Towards Conversational Search and Recommendation: System Ask, User Respond , 2018, CIKM.

[58]  Jun Huang,et al.  Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining , 2020, EMNLP.

[59]  Dongyan Zhao,et al.  One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues , 2019, ACL.

[60]  Xuanjing Huang,et al.  Pre-trained Models for Natural Language Processing: A Survey , 2020, ArXiv.

[61]  Paul N. Bennett,et al.  Leading Conversational Search by Suggesting Useful Questions , 2020, WWW.

[62]  Robert N. Oddy,et al.  INFORMATION RETRIEVAL THROUGH MAN‐MACHINE DIALOGUE , 1977 .

[63]  W. Bruce Croft,et al.  Analyzing and Characterizing User Intent in Information-seeking Conversations , 2018, SIGIR.

[64]  W. Bruce Croft,et al.  Learning a Hierarchical Embedding Model for Personalized Product Search , 2017, SIGIR.

[65]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.