Query Rewriting using Automatic Synonym Extraction for E-commerce Search

Query rewriting is a critical component in modern search engines. It is the process of altering and enhancing raw user queries using synonymous keywords or structured metadata to improve search recall and relevance using data mining techniques applied on textual data and user behavioral signals. For example, the query bicycle is rewritten to match (bicycle OR bike) i.e. all items that either contain the word bicycle or bike in their title are returned for this query. Choosing the right set of synonymous terms for a given query term is essential to ensure the quality of search results, especially in the context of e-commerce where buyer needs can be very specific. As an example, shoe is a good synonym for shoes, whereas sandals, while related, is not a good synonym for shoes. In this work, we describe one version of the approaches to query rewriting taken at eBay search. At a high level, we use a two step process to generate and apply synonyms for query expansions 1. offline token level synonym generation and 2. runtime search query rewriting. In the offline phase, we first generate a large pool of candidate synonyms for query tokens using various techniques leveraging user behavioral data, inventory and taxonomy information and open source knowledge bases; then, we leverage a machine learned binary classifier t rained o n h uman j udged b inary r elevance l abels t o filter the candidate synonyms that are truly useful as query expansions without compromising result set precision; this classifier allows us to leverage a wide variety of sources and techniques to generate synonym candidates by providing a scientific and scalable method to evaluate their effectiveness for query rewriting. This filtered set of token level synonyms is stored in a dictionary for runtime query rewriting. In the online phase, we rewrite user search queries by combining the token level synonyms in the dictionary, creating a boolean recall expression. We empirically demonstrate the value of this approach to enhance e-commerce search recall and relevance.

[1]  Hua Ouyang,et al.  Learning to Rewrite Queries , 2016, CIKM.

[2]  A. Azzouz 2011 , 2020, City.

[3]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[4]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[5]  Andrew Trotman,et al.  The Architecture of eBay Search , 2017, eCOM@SIGIR.

[6]  Yi Liu,et al.  Query Rewriting Using Monolingual Statistical Machine Translation , 2010, CL.

[7]  Jianfeng Gao,et al.  Learning Lexicon Models from Search Logs for Query Expansion , 2012, EMNLP.

[8]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[11]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[12]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[13]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[14]  Tracy Holloway King,et al.  Mickey Mouse is not a Phrase: Improving Relevance in E-Commerce with Multiword Expressions , 2014, MWE@EACL.

[15]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[16]  Jianfeng Gao,et al.  Query expansion using path-constrained random walks , 2013, SIGIR.

[17]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[18]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[19]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.