Dataset of Natural Language Queries for E-Commerce

Shopping online is more and more frequent in our everyday life. For e-commerce search systems, understanding natural language coming through voice assistants, chatbots or from conversational search is an essential ability to understand what the user really wants. However, evaluation datasets with natural and detailed information needs of product-seekers which could be used for research do not exist. Due to privacy issues and competitive consequences, only few datasets with real user search queries from logs are openly available. In this paper, we present a dataset of 3,540 natural language queries in two domains that describe what users want when searching for a laptop or a jacket of their choice. The dataset contains annotations of vague terms and key facts of 1,754 laptop queries. This dataset opens up a range of research opportunities in the fields of natural language processing and (interactive) information retrieval for product search.

[1]  Chao Wang,et al.  Multi-Candidate Ranking Algorithm Based Spell Correction , 2019, eCOM@SIGIR.

[2]  Katsumi Tanaka,et al.  Cognitive search intents hidden behind queries: a user study on query formulations , 2014, WWW '14 Companion.

[3]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[4]  Daniel McDuff,et al.  MISC: A data set of information-seeking conversations , 2017 .

[5]  Norbert Fuhr,et al.  'A Modern Up-To-Date Laptop' - Vagueness in Natural Language Queries for Product Search , 2020, Conference on Designing Interactive Systems.

[6]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[7]  Zhoujun Li,et al.  Building Task-Oriented Dialogue Systems for Online Shopping , 2017, AAAI.

[8]  Claudia Hauff,et al.  Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset , 2019, ArXiv.

[9]  Peter Bailey,et al.  User Variability and IR System Evaluation , 2015, SIGIR.

[10]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[11]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[12]  Filip Radlinski,et al.  Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences , 2019, SIGdial.

[13]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Daria Sorokina,et al.  Amazon Search: The Joy of Ranking Products , 2016, SIGIR.

[15]  Andrew Trotman,et al.  The Architecture of eBay Search , 2017, eCOM@SIGIR.

[16]  Jie Yang,et al.  The Role of Attributes in Product Quality Comparisons , 2020, CHIIR.

[17]  Hang Yu,et al.  Query Classification with Multi-objective Backoff Optimization , 2020, SIGIR.

[18]  Iris Hendrickx,et al.  Overview of the CLEF 2016 Social Book Search Lab , 2016, CLEF.

[19]  W. Bruce Croft,et al.  Asking Clarifying Questions in Open-Domain Information-Seeking Conversations , 2019, SIGIR.

[20]  M. de Rijke,et al.  Challenges and research opportunities in eCommerce search and recommendations , 2020, SIGIR Forum.

[21]  Yvonne Kammerer,et al.  Children's web search with Google: the effectiveness of natural language queries , 2012, IDC '12.

[22]  Yelong Shen,et al.  Sparse hidden-dynamics conditional random fields for user intent understanding , 2011, WWW.

[23]  Yiqun Liu,et al.  User Intent, Behaviour, and Perceived Satisfaction in Product Search , 2018, WSDM.

[24]  Ido Guy,et al.  Searching by Talking: Analysis of Voice Queries on Mobile Web Search , 2016, SIGIR.

[25]  Mark Sanderson,et al.  How Do People Interact in Conversational Speech-Only Search Tasks: A Preliminary Analysis , 2017, CHIIR.

[26]  Mohit Sharma,et al.  A Taxonomy of Queries for E-commerce Search , 2018, SIGIR.

[27]  Surya Kallumadi,et al.  E-commerce Query Classification Using Product Taxonomy Mapping: A Transfer Learning Approach , 2019, eCOM@SIGIR.

[28]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[29]  W. Bruce Croft,et al.  Analyzing and Characterizing User Intent in Information-seeking Conversations , 2018, SIGIR.

[30]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[31]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[32]  Matthias Hagen,et al.  Exploratory Search Missions for TREC Topics , 2013, EuroHCIR.