Learning a Hierarchical Embedding Model for Personalized Product Search

Product search is an important part of online shopping. In contrast to many search tasks, the objectives of product search are not confined to retrieving relevant products. Instead, it focuses on finding items that satisfy the needs of individuals and lead to a user purchase. The unique characteristics of product search make search personalization essential for both customers and e-shopping companies. Purchase behavior is highly personal in online shopping and users often provide rich feedback about their decisions (e.g. product reviews). However, the severe mismatch found in the language of queries, products and users make traditional retrieval models based on bag-of-words assumptions less suitable for personalization in product search. In this paper, we propose a hierarchical embedding model to learn semantic representations for entities (i.e. words, products, users and queries) from different levels with their associated language data. Our contributions are three-fold: (1) our work is one of the initial studies on personalized product search; (2) our hierarchical embedding model is the first latent space model that jointly learns distributed representations for queries, products and users with a deep neural network; (3) each component of our network is designed as a generative model so that the whole structure is explainable and extendable. Following the methodology of previous studies, we constructed personalized product search benchmarks with Amazon product data. Experiments show that our hierarchical embedding model significantly outperforms existing product search baselines on multiple benchmark datasets.

[1]  ChengXiang Zhai,et al.  A probabilistic mixture model for mining and analyzing product search log , 2013, CIKM.

[2]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[4]  James P. Callan,et al.  Experiments with Language Models for Known-Item Finding of E-mail Messages , 2005, TREC.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Susan T. Dumais,et al.  Personalized Search: Potential and Pitfalls , 2016, NTCIR.

[8]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[9]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[10]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[11]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[12]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[13]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[14]  W. B. Lee,et al.  Multi-facet product information search and retrieval using semantically annotated product family ontology , 2010, Inf. Process. Manag..

[15]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.

[16]  J. Rowley Product search in e‐shopping: a review and research propositions , 2000 .

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[20]  Meredith Ringel Morris,et al.  Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting , 2008, CSCW.

[21]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[22]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[23]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[24]  Bernard J. Jansen,et al.  The effectiveness of Web search engines for retrieving relevant ecommerce links , 2006, Inf. Process. Manag..

[25]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[26]  Eemil Lagerspetz,et al.  Product retrieval for grocery stores , 2008, SIGIR '08.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[29]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[30]  ChengXiang Zhai,et al.  Supporting Keyword Search in Product Database: A Probabilistic Approach , 2013, Proc. VLDB Endow..

[31]  ChengXiang Zhai,et al.  Mining Coordinated Intent Representation for Entity Search and Recommendation , 2015, CIKM.

[32]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.