Measuring User Satisfaction on Smart Speaker Intelligent Assistants Using Intent Sensitive Query Embeddings

Intelligent assistants are increasingly being used on smart speaker devices, such as Amazon Echo, Google Home, Apple Homepod, and Harmon Kardon Invoke with Cortana. Typically, user satisfaction measurement relies on user interaction signals, such as clicks and scroll movements, in order to determine if a user was satisfied. However, these signals do not exist for smart speakers, which creates a challenge for user satisfaction evaluation on these devices. In this paper, we propose a new signal, user intent, as a means to measure user satisfaction. We propose to use this signal to model user satisfaction in two ways: 1) by developing intent sensitive word embeddings and then using sequences of these intent sensitive query representations to measure user satisfaction; 2) by representing a user's interactions with a smart speaker as a sequence of user intents and thus using this sequence to identify user satisfaction. Our experimental results indicate that our proposed user satisfaction models based on the intent-sensitive query representations have statistically significant improvements over several baselines in terms of common classification evaluation metrics. In particular, our proposed task satisfaction prediction model based on intent-sensitive word embeddings has a 11.81% improvement over a generative model baseline and 6.63% improvement over a user satisfaction prediction model based on Skip-gram word embeddings in terms of the F1 metric. Our findings have implications for the evaluation of Intelligent Assistant systems.

[1]  Imed Zitouni,et al.  Automatic Online Evaluation of Intelligent Assistants , 2015, WWW.

[2]  Imed Zitouni,et al.  Understanding User Satisfaction with Intelligent Assistants , 2016, CHIIR.

[3]  Xinyu Dai,et al.  Topic2Vec: Learning distributed representations of topics , 2015, 2015 International Conference on Asian Language Processing (IALP).

[4]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[5]  Chih-Hung Hsieh,et al.  Towards better measurement of attention and satisfaction in mobile search , 2014, SIGIR.

[6]  Emine Yilmaz,et al.  Task Embeddings: Learning Query Embeddings using Task Context , 2017, CIKM.

[7]  Madian Khabsa,et al.  Learning to Account for Good Abandonment in Search Success Metrics , 2016, CIKM.

[8]  Madian Khabsa,et al.  Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search , 2016, SIGIR.

[9]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[10]  Ryen W. White,et al.  Personalized models of search satisfaction , 2013, CIKM.

[11]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[12]  Gleb Gusev,et al.  Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments , 2015, WWW.

[13]  Dean Eckles,et al.  Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods , 2013, KDD.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Ya Xu,et al.  Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices , 2009, WWW '09.

[17]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Contextual Suggestion Track , 2013, TREC.

[18]  Ahmed Hassan Awadallah,et al.  A semi-supervised approach to modeling web search satisfaction , 2012, SIGIR '12.

[19]  Nick Craswell,et al.  Beyond clicks: query reformulation as a predictor of search satisfaction , 2013, CIKM.

[20]  Imed Zitouni,et al.  Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.

[21]  Paul A. Crook,et al.  Impact of Domain and User's Learning Phase on Task and Session Identification in Smart Speaker Intelligent Assistants , 2018, CIKM.

[22]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[23]  W. Bruce Croft,et al.  Relevance-based Word Embedding , 2017, SIGIR.

[24]  Imed Zitouni,et al.  Does That Mean You're Happy?: RNN-based Modeling of User Interaction Sequences to Detect Good Abandonment , 2017, CIKM.

[25]  James Allan,et al.  Predicting searcher frustration , 2010, SIGIR.

[26]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[27]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[28]  Eugene Agichtein,et al.  Find it if you can: a game for modeling different types of web search success using interaction data , 2011, SIGIR.

[29]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[30]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[31]  Jaime Teevan,et al.  Understanding the importance of location, time, and people in mobile local search behavior , 2011, Mobile HCI.

[32]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[33]  Ryen W. White,et al.  Understanding and Predicting Graded Search Satisfaction , 2015, WSDM.

[34]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[35]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[36]  Madian Khabsa,et al.  User Interaction Sequences for Search Satisfaction Prediction , 2017, SIGIR.

[37]  Yu Guo,et al.  Statistical inference in two-stage online controlled experiments with treatment selection and validation , 2014, WWW.

[38]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[39]  Ryen W. White,et al.  Playing by the rules: mining query associations to predict search performance , 2013, WSDM.

[40]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[41]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[42]  Jaap Kamps,et al.  Where To Go Next?: Exploiting Behavioral User Models in Smart Environments , 2017, UMAP.

[43]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[44]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[45]  Christof Monz,et al.  Learning Topic-Sensitive Word Representations , 2017, ACL.

[46]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[47]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[48]  Madian Khabsa,et al.  Detecting Good Abandonment in Mobile Search , 2016, WWW.

[49]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[50]  Ryen W. White,et al.  Comparing client and server dwell time estimates for click-level satisfaction prediction , 2014, SIGIR.

[51]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[52]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[53]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[54]  Allan Hanbury,et al.  Word Embedding Causes Topic Shifting; Exploit Global Context! , 2017, SIGIR.

[55]  W. Bruce Croft,et al.  Learning a Hierarchical Embedding Model for Personalized Product Search , 2017, SIGIR.