Light feed-forward networks for shard selection in large-scale product search

Large-scale information retrieval systems store documents in different shards. Shard selection enables cost-effective retrieval by searching only relevant shards for the query. Most existing shard selection algorithms focus on web search, and rely on text similarity between the query and shard corpora. In contrast, in e-commerce product search, shards are defined according to product categories, and most queries imply product category intent. Such characteristics are yet to be leveraged for shard selection. In this work, we formulate shard selection in product search as amulti-label query intent classification problem. We show that light feed-forward neural networks, with language-independent features, suffice to achieve high performance for this recall-oriented task. The simple architecture allows for low-latency shard selection in the early retrieval process. We evaluate the model in terms of cost reduction and impact on the relevance of retrieved documents, both in offline simulation and online A/B testing. Without degrading customer experience, we achieve double-digit percentage of search engine cost reduction in multiple locales, and the model has been deployed to serve Amazon Search customers worldwide.

[1]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[2]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[3]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[4]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[5]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[8]  Yangyang Shi,et al.  Deep LSTM based Feature Mapping for Query Classification , 2016, NAACL.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Djoerd Hiemstra,et al.  Taily: shard selection using the tail of score distributions , 2013, SIGIR.

[11]  Yubin Kim,et al.  Learning To Rank Resources , 2017, SIGIR.

[12]  J. Shane Culpepper,et al.  Dynamic Shard Cutoff Prediction for Selective Search , 2018, SIGIR.

[13]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[14]  Lizhen Liu,et al.  Query Classification Using Convolutional Neural Networks , 2017, 2017 10th International Symposium on Computational Intelligence and Design (ISCID).

[15]  Ji-Rong Wen,et al.  Query clustering using content words and user feedback , 2001, SIGIR '01.

[16]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[17]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[18]  Fernando Diaz,et al.  Classification-based resource selection , 2009, CIKM.

[19]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Luo Si,et al.  Learning from past queries for resource selection , 2009, CIKM.

[22]  Djoerd Hiemstra,et al.  Shard ranking and cutoff estimation for topically partitioned collections , 2012, CIKM.

[23]  Milad Shokouhi,et al.  SUSHI : Scoring Scaled Samples for Server Selection , 2009 .

[24]  Homa B. Hashemi,et al.  Query Intent Detection using Convolutional Neural Networks , 2016 .

[25]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[26]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  W. Bruce Croft,et al.  Blog site search using resource selection , 2008, CIKM '08.

[30]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[31]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[32]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[33]  Philip S. Yu,et al.  Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach , 2016, WWW.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Norbert Fuhr,et al.  Evaluating different methods of estimating retrieval quality for resource selection , 2003, SIGIR.