Multi-modal Preference Modeling for Product Search

The visual preference of users for products has been largely ignored by the existing product search methods. In this work, we propose a multi-modal personalized product search method, which aims to search products which not only are relevant to the submitted textual query, but also match the user preferences from both textual and visual modalities. To achieve the goal, we first leverage the also_view and buy_after_viewing products to construct the visual and textual latent spaces, which are expected to preserve the visual similarity and semantic similarity of products, respectively. We then propose a translation-based search model (TranSearch ) to 1) learn a multi-modal latent space based on the pre-trained visual and textual latent spaces; and 2) map the users, queries and products into this space for direct matching. The TranSearch model is trained based on a comparative learning strategy, such that the multi-modal latent space is oriented to personalized ranking in the training stage. Experiments have been conducted on real-world datasets to validate the effectiveness of our method. The results demonstrate that our method outperforms the state-of-the-art method by a large margin.

[1]  Meng Wang,et al.  Low-Rank Multi-View Embedding Learning for Micro-Video Popularity Prediction , 2018, IEEE Transactions on Knowledge and Data Engineering.

[2]  Yang Yang,et al.  Start from Scratch: Towards Automatically Identifying, Modeling, and Naming Visual Attributes , 2014, ACM Multimedia.

[3]  Meng Wang,et al.  Towards Micro-video Understanding by Joint Sequential-Sparse Modeling , 2017, ACM Multimedia.

[4]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  ChengXiang Zhai,et al.  Mining Coordinated Intent Representation for Entity Search and Recommendation , 2015, CIKM.

[7]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[8]  J. Rowley Product search in e‐shopping: a review and research propositions , 2000 .

[9]  W. Bruce Croft,et al.  Learning a Hierarchical Embedding Model for Personalized Product Search , 2017, SIGIR.

[10]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Mohan S. Kankanhalli,et al.  Exploiting Music Play Sequence for Music Recommendation , 2017, IJCAI.

[13]  Chen Fang,et al.  Vista: A Visually, Socially, and Temporally-aware Model for Artistic Recommendation , 2016, RecSys.

[14]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[15]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[16]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[17]  Julian J. McAuley,et al.  VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback , 2015, AAAI.

[18]  Yiqun Liu,et al.  User Intent, Behaviour, and Perceived Satisfaction in Product Search , 2018, WSDM.

[19]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[20]  Marie-Francine Moens,et al.  User Profiling through Deep Multimodal Fusion , 2018, WSDM.

[21]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  ChengXiang Zhai,et al.  A probabilistic mixture model for mining and analyzing product search log , 2013, CIKM.

[24]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[27]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[28]  Marie-Francine Moens,et al.  Web Search of Fashion Items with Multimodal Querying , 2018, WSDM.

[29]  Mingsheng Long,et al.  Transitive Hashing Network for Heterogeneous Multimedia Retrieval , 2017, AAAI.

[30]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[31]  Alexander Kotov,et al.  Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images , 2018, WSDM.

[32]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[33]  Beng Chin Ooi,et al.  Effective deep learning-based multi-modal retrieval , 2015, The VLDB Journal.

[34]  Xu Chen,et al.  Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources , 2017, CIKM.

[35]  Meng Liu,et al.  Attentive Moment Retrieval in Videos , 2018, SIGIR.

[36]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[37]  Shih-Fu Chang,et al.  Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[39]  ChengXiang Zhai,et al.  Supporting Keyword Search in Product Database: A Probabilistic Approach , 2013, Proc. VLDB Endow..

[40]  Shih-Fu Chang,et al.  Grounding Referring Expressions in Images by Variational Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Mohan S. Kankanhalli,et al.  A^3NCF: An Adaptive Aspect Attention Model for Rating Prediction , 2018, IJCAI.

[42]  Nitish Srivastava,et al.  Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .

[43]  Qi Tian,et al.  Enhancing Micro-video Understanding by Harnessing External Sounds , 2017, ACM Multimedia.