Query Understanding for Surfacing Under-served Music Content

Platform ecosystems have witnessed an explosive growth by facilitating interactions between consumers and suppliers. Search systems powering such platforms play an important role in surfacing content in front of users. To maintain a healthy, sustainable platform, systems designers often need to explicitly consider exposing under-served content to users, content which might otherwise remain undiscovered. In this work, we consider the question when we might surface under-served content in search results, and investigate ways to provide exposure to certain content groups. We propose a framework to develop query understanding techniques to identify potential non-focused search queries on a music streaming platform, where users' information needs are non-specific enough to expose under-served content without severely impacting user satisfaction. We present insights from a search ranker deployed at scale and present results from live A/B test targeting a random sample of 72 million users and 593 million sessions, to compare performance of different methods considered to identify non-focused queries for surfacing under-served content.

[1]  Ricardo Baeza-Yates Semantic Query Understanding , 2017, SIGIR '17.

[2]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[3]  Hiroyuki Shindo,et al.  Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia , 2020, EMNLP.

[4]  Mounia Lalmas,et al.  Algorithmic Effects on the Diversity of Consumption on Spotify , 2020, WWW.

[5]  Mounia Lalmas,et al.  Jointly Leveraging Intent and Interaction Signals to Predict User Satisfaction with Slate Recommendations , 2019, WWW.

[6]  Charles L. A. Clarke,et al.  Classifying and Characterizing Query Intent , 2009, ECIR.

[7]  Joachim Köhler,et al.  Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion , 2017, INTERSPEECH.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Hiroyuki Shindo,et al.  Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia , 2018, ArXiv.

[10]  Deepayan Chakrabarti,et al.  Mining broad latent query aspects from search sessions , 2009, KDD.

[11]  Xiao Li,et al.  Semi-supervised learning of semantic classes for query understanding: from the web and for the web , 2009, CIKM.

[12]  B. Martens An Economic Policy Perspective on Online Platforms , 2016 .

[13]  Roi Blanco,et al.  Lightweight Multilingual Entity Extraction and Linking , 2017, WSDM.

[14]  Noureddine Ellouze,et al.  Study of Phonemes Confusions in Hierarchical Automatic Phoneme Recognition System , 2015, ArXiv.

[15]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[16]  Thomas R. Eisenmann,et al.  Strategies for Two Sided Markets , 2006 .

[17]  Ang Li,et al.  Search Mindsets: Understanding Focused and Non-Focused Information Seeking in Music Search , 2019, WWW.

[18]  Sheng Chen,et al.  DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging , 2017, Rep4NLP@ACL.

[19]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[20]  W. Bruce Croft,et al.  Query representation and understanding workshop , 2011, SIGF.

[21]  Puneet Manchanda,et al.  Platforms: a multiplicity of research opportunities , 2014, Marketing Letters.

[22]  Homa B. Hashemi,et al.  Query Intent Detection using Convolutional Neural Networks , 2016 .

[23]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[24]  M. Armstrong Competition in Two-Sided Markets ¤ , 2005 .

[25]  Xiaofeng Meng,et al.  Query Understanding through Knowledge-Based Conceptualization , 2015, IJCAI.

[26]  Bamshad Mobasher,et al.  Recommender Systems as Multistakeholder Environments , 2017, UMAP.

[27]  Eugene Agichtein,et al.  Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries , 2010, NAACL.

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  Fernando Diaz,et al.  Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems , 2018, CIKM.

[30]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[31]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[32]  Max Bramer Avoiding Overfitting of Decision Trees , 2013 .

[33]  Jimmy J. Lin,et al.  Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform , 2018, KDD.

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Marc Rysman The Economics of Two-Sided Markets , 2009 .