Embedding-based Product Retrieval in Taobao Search

Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.

[1]  Ji-Rong Wen,et al.  Personalizing Search Results Using Hierarchical RNN with Query-aware Attention , 2018, CIKM.

[2]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[3]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[4]  Ping Li,et al.  MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search , 2019, KDD.

[5]  Kang Zhang,et al.  Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning , 2020, SIGIR.

[6]  Chang Zhou,et al.  Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.

[7]  Choon Hui Teo,et al.  Semantic Product Search , 2019, KDD.

[8]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[9]  Keping Yang,et al.  Deep Session Interest Network for Click-Through Rate Prediction , 2019, IJCAI.

[10]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[11]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[12]  Karthik Subbian,et al.  Learning Robust Models for e-Commerce Product Search , 2020, ACL.

[13]  Dacheng Tao,et al.  Empowering Things With Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things , 2020, IEEE Internet of Things Journal.

[14]  Daria Sorokina,et al.  Amazon Search: The Joy of Ranking Products , 2016, SIGIR.

[15]  Hanwang Zhang,et al.  "Click" Is Not Equal to "Like": Counterfactual Recommendation for Mitigating Clickbait Issue , 2020, ArXiv.

[16]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[19]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[20]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[23]  Heng-Tze Cheng,et al.  Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval , 2020, CIKM.

[24]  W. Bruce Croft,et al.  A Zero Attention Model for Personalized Product Search , 2019, CIKM.

[25]  Rong Xiao,et al.  Weakly Supervised Co-Training of Query Rewriting andSemantic Matching for e-Commerce , 2019, WSDM.

[26]  Li Wei,et al.  Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[27]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[28]  W. Bruce Croft,et al.  Learning a Hierarchical Embedding Model for Personalized Product Search , 2017, SIGIR.

[29]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[30]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[31]  Wilfred Ng,et al.  SDM: Sequential Deep Matching Model for Online Large-scale Recommender System , 2019, CIKM.

[32]  Luo Si,et al.  Cascade Ranking for Operational E-commerce Search , 2017, KDD.

[33]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[34]  Linjun Yang,et al.  Embedding-based Retrieval in Facebook Search , 2020, KDD.

[35]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[36]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[37]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.