Multimodal Conversational Fashion Recommendation with Positive and Negative Natural-Language Feedback

In a real-world shopping scenario, users can express their natural-language feedback when communicating with a shopping assistant by stating their satisfactions positively with “I like” or negatively with “I dislike” according to the quality of the recommended fashion products. A multimodal conversational recommender system (using text and images in particular) aims to replicate this process by eliciting the dynamic preferences of users from their natural-language feedback and updating the visual recommendations so as to satisfy the users’ current needs through multi-turn interactions. However, the impact of positive and negative natural-language feedback on the effectiveness of multimodal conversational recommendation has not yet been fully explored.Since there are no datasets of conversational recommendation with both positive and negative natural-language feedback, the existing research on multimodal conversational recommendation imposed several constraints on the users’ natural-language expressions (i.e. either only describing their preferred attributes as positive feedback or rejecting the undesired recommendations without any natural-language critiques) to simplify the multimodal conversational recommendation task. To further explore the multimodal conversational recommendation with positive and negative natural-language feedback, we investigate the effectiveness of the recent multimodal conversational recommendation models for effectively incorporating the users’ preferences over time from both positively and negatively natural-language oriented feedback corresponding to the visual recommendations. We also propose an approach to generate both positive and negative natural-language critiques about the recommendations within an existing user simulator. Following previous work, we train and evaluate the two existing conversational recommendation models by using the user simulator with positive and negative feedback as a surrogate for real human users. Extensive experiments conducted on a well-known fashion dataset demonstrate that positive natural-language feedback is more informative relating to the users’ preferences in comparison to negative natural-language feedback.

[1]  D. Jannach,et al.  Conversational Recommendation: A Grand AI Challenge , 2022, AI Mag..

[2]  Craig MacDonald,et al.  Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation , 2021, RecSys.

[3]  Pablo Castells,et al.  SimuRec: Workshop on Synthetic Data and Simulation Methods for Recommender Systems Research , 2021, RecSys.

[4]  Edgar J. Lobaton,et al.  Fashion Recommendation Systems, Models and Methods: A Review , 2021, Informatics.

[5]  Tat-Seng Chua,et al.  MMConv: An Environment for Multimodal Conversational Search across Multiple Domains , 2021, SIGIR.

[6]  Hamed Zamani,et al.  Towards Multi-Modal Conversational Information Seeking , 2021, SIGIR.

[7]  Yifei Yuan,et al.  Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback , 2021, SIGIR.

[8]  Ling Shao,et al.  Kaleido-BERT: Vision-Language Pre-training on Fashion Domain , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ji-rong Wen,et al.  Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation , 2021, WSDM.

[10]  M. de Rijke,et al.  Advances and Challenges in Conversational Recommender Systems: A Survey , 2021, AI Open.

[11]  Yulong Gu,et al.  Neural Interactive Collaborative Filtering , 2020, SIGIR.

[12]  Krisztian Balog,et al.  Evaluating Conversational Recommender Systems via User Simulation , 2020, KDD.

[13]  Hao Wang,et al.  FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval , 2020, SIGIR.

[14]  Hongxia Jin,et al.  Towards Hands-Free Visual Dialog Interactive Recommendation , 2020, AAAI.

[15]  Xiangnan He,et al.  Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems , 2020, WSDM.

[16]  W. Bruce Croft,et al.  Conversational Product Search Based on Negative Feedback , 2019, CIKM.

[17]  Quan Z. Sheng,et al.  Sequential Recommender Systems: Challenges, Progress and Prospects , 2019, IJCAI.

[18]  Hongxia Jin,et al.  A Visual Dialog Augmented Interactive Recommender System , 2019, KDD.

[19]  Steven J. Rennie,et al.  Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xu Chen,et al.  Towards Conversational Search and Recommendation: System Ask, User Respond , 2018, CIKM.

[21]  Zeynep Batmaz,et al.  A review on deep learning for recommender systems: challenges and remedies , 2018, Artificial Intelligence Review.

[22]  Yi Zhang,et al.  Conversational Recommender System , 2018, SIGIR.

[23]  Rogério Schmidt Feris,et al.  Dialog-based Interactive Image Retrieval , 2018, NeurIPS.

[24]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[25]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[29]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Roger Zimmermann,et al.  Multimodal research in vision and language: A review of current and emerging trends , 2022, Inf. Fusion.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Hongxia Jin,et al.  Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning , 2019, NeurIPS.