Large Language Models as Zero-Shot Conversational Recommenders

In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions. (1) Data: To gain insights into model behavior in"in-the-wild"conversational recommendation scenarios, we construct a new dataset of recommendation-related conversations by scraping a popular discussion website. This is the largest public real-world conversational recommendation dataset to date. (2) Evaluation: On the new dataset and two existing conversational recommendation datasets, we observe that even without fine-tuning, large language models can outperform existing fine-tuned conversational recommendation models. (3) Analysis: We propose various probing tasks to investigate the mechanisms behind the remarkable performance of large language models in conversational recommendation. We analyze both the large language models' behaviors and the characteristics of the datasets, providing a holistic understanding of the models' effectiveness, limitations and suggesting directions for the design of future conversational recommenders

[1]  S. Levine,et al.  The False Promise of Imitating Proprietary LLMs , 2023, ArXiv.

[2]  Wayne Xin Zhao,et al.  Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models , 2023, EMNLP.

[3]  Wayne Xin Zhao,et al.  Large Language Models are Zero-Shot Rankers for Recommender Systems , 2023, ArXiv.

[4]  Hakim Sidahmed,et al.  Leveraging Large Language Models in Conversational Recommender Systems , 2023, ArXiv.

[5]  Ed H. Chi,et al.  Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction , 2023, ArXiv.

[6]  Xiangnan He,et al.  TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation , 2023, ArXiv.

[7]  Michael Bendersky,et al.  LaMP: When Large Language Models Meet Personalization , 2023, ArXiv.

[8]  Kangdi Zhou,et al.  Is ChatGPT a Good Recommender? A Preliminary Study , 2023, ArXiv.

[9]  G. Medioni,et al.  GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation , 2023, eCom@SIGIR.

[10]  Xiangnan He,et al.  Generative Recommendation: Towards Next-generation Recommender Paradigm , 2023, ArXiv.

[11]  Julian McAuley,et al.  Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data , 2023, EMNLP.

[12]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[13]  Zhilin Yang,et al.  CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X , 2023, ArXiv.

[14]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[15]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[16]  Julian McAuley,et al.  Bundle MCR: Towards Conversational Bundle Recommendation , 2022, RecSys.

[17]  Yingqiang Ge,et al.  A Survey on Trustworthy Recommender Systems , 2022, ACM Transactions on Recommender Systems.

[18]  E. Kanoulas,et al.  Improving Conversational Recommender Systems via Transformer-based Sequential Modelling , 2022, SIGIR.

[19]  M. de Rijke,et al.  Variational Reasoning about User Preferences for Conversational Recommendation , 2022, SIGIR.

[20]  Wayne Xin Zhao,et al.  Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning , 2022, KDD.

[21]  Jingren Zhou,et al.  M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems , 2022, ArXiv.

[22]  Ruobing Xie,et al.  User-Centric Conversational Recommendation with Multi-Aspect User Modeling , 2022, SIGIR.

[23]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[24]  Yingqiang Ge,et al.  Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) , 2022, RecSys.

[25]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[26]  William W. Cohen,et al.  Transformer Memory as a Differentiable Search Index , 2022, NeurIPS.

[27]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[28]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[29]  Zhihua Wei,et al.  Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation , 2021, WWW.

[30]  Bodhisattwa Prasad Majumder,et al.  Self-Supervised Bot Play for Conversational Recommendation with Justifications , 2021, ArXiv.

[31]  Ajinkya Kale,et al.  Locker: Locally Constrained Self-Attentive Sequential Recommendation , 2021, CIKM.

[32]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[33]  M. de Rijke,et al.  A Next Basket Recommendation Reality Check , 2021, ACM Trans. Inf. Syst..

[34]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[35]  Shuguang Cui,et al.  RevCore: Review-Augmented Conversational Recommendation , 2021, FINDINGS.

[36]  Minlie Huang,et al.  CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for Conversational Recommendation , 2020, EMNLP.

[37]  Wayne Xin Zhao,et al.  Towards Topic-Guided Conversational Recommender System , 2020, COLING.

[38]  Xiangnan He,et al.  Bias and Debias in Recommender System: A Survey and Future Directions , 2020, ACM Trans. Inf. Syst..

[39]  Nicola De Cao,et al.  Autoregressive Entity Retrieval , 2020, ICLR.

[40]  Weiyan Shi,et al.  INSPIRED: Toward Sociable Recommendation Dialog Systems , 2020, EMNLP.

[41]  Ji-Rong Wen,et al.  S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization , 2020, CIKM.

[42]  C. Hauff,et al.  What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation , 2020, RecSys.

[43]  Kun Zhou,et al.  Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion , 2020, KDD.

[44]  Xiangnan He,et al.  Interactive Path Reasoning on Graph for Conversational Recommendation , 2020, KDD.

[45]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[46]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[47]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[48]  Xiangnan He,et al.  Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems , 2020, WSDM.

[49]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.

[50]  Pararth Shah,et al.  Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue , 2019, EMNLP.

[51]  Hongxia Yang,et al.  Towards Knowledge-Based Recommender Dialog System , 2019, EMNLP.

[52]  Peng Jiang,et al.  BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer , 2019, CIKM.

[53]  Christopher Joseph Pal,et al.  Towards Deep Conversational Recommendations , 2018, NeurIPS.

[54]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[55]  Harsh Jhamtani,et al.  Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data , 2018, ACL.

[56]  Xiaoyu Du,et al.  Adversarial Personalized Ranking for Recommendation , 2018, SIGIR.

[57]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[58]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[59]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[60]  Tri Minh Nguyen,et al.  MS MARCO: A Human Generated MAchine Reading COmprehension Dataset , 2016 .

[61]  Jianfeng Gao,et al.  MS MARCO: A Human Generated MAchine Reading COmprehension Dataset , 2016, CoCo@NIPS.

[62]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[63]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[64]  Scott Sanner,et al.  AutoRec: Autoencoders Meet Collaborative Filtering , 2015, WWW.

[65]  George Karypis,et al.  FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[66]  Li Chen,et al.  Critiquing-based recommenders: survey and emerging trends , 2012, User Modeling and User-Adapted Interaction.

[67]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[68]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[69]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[70]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[71]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[72]  Ga Wu,et al.  Deep Language-based Critiquing for Recommender Systems , 2019 .