Evaluating Personal Assistants on Mobile devices

The iPhone was introduced only a decade ago in 2007 but has fundamentally changed the way we interact with online information. Mobile devices differ radically from classic command-based and point-and-click user interfaces, now allowing for gesture-based interaction using fine-grained touch and swipe signals. Due to the rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft's Cortana, Google Now, and Apple's Siri, mobile devices have become personal, allowing us to be online all the time, and assist us in any task, both in work and in our daily lives, making context a crucial factor to consider. Mobile usage is now exceeding desktop usage, and is still growing at a rapid rate, yet our main ways of training and evaluating personal assistants are still based on (and framed in) classical desktop interactions, focusing on explicit queries, clicks, and dwell time spent. However, modern user interaction with mobile devices is radically different due to touch screens with a gesture- and voice-based control and the varying context of use, e.g., in a car, by bike, often invalidating the assumptions underlying today's user satisfaction evaluation. There is an urgent need to understand voice- and gesture-based interaction, taking all interaction signals and context into account in appropriate ways. We propose a research agenda for developing methods to evaluate and improve context-aware user satisfaction with mobile interactions using gesture-based signals at scale.

[1]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[2]  Jane Yung-jen Hsu,et al.  Who likes it more?: mining worth-recommending items from long tails by modeling relative preference , 2014, WSDM.

[3]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[4]  Antti Oulasvirta,et al.  Is motion capture-based biomechanical simulation valid for HCI studies?: study and implications , 2014, CHI.

[5]  Francisco C. Pereira,et al.  Making sense of location context , 2012, ContextDD '12.

[6]  Virpi Roto,et al.  Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI , 2005, CHI.

[7]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[8]  Enhong Chen,et al.  Context-aware ranking in web search , 2010, SIGIR '10.

[9]  M. de Rijke,et al.  Click model-based information retrieval metrics , 2013, SIGIR.

[10]  Gökhan Tür,et al.  TechWare: Spoken Language Understanding Resources [Best of the Web] , 2013, IEEE Signal Processing Magazine.

[11]  Eric Crestan,et al.  Modelling and Detecting Changes in User Satisfaction , 2014, CIKM.

[12]  Gökhan Tür,et al.  Understanding Spoken Language , 2014, Computing Handbook, 3rd ed..

[13]  Yiqun Liu,et al.  Different Users, Different Opinions: Predicting Search Satisfaction with Mouse Movement Information , 2015, SIGIR.

[14]  Yu Guo,et al.  Statistical inference in two-stage online controlled experiments with treatment selection and validation , 2014, WWW.

[15]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[16]  Milad Shokouhi,et al.  Did You Say U2 or YouTube?: Inferring Implicit Transcripts from Voice Search Logs , 2016, WWW.

[17]  Gleb Gusev,et al.  Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments , 2015, WWW.

[18]  Dean Eckles,et al.  Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods , 2013, KDD.

[19]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[20]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Toon Calders,et al.  Discovering temporal hidden contexts in web sessions for user trail prediction , 2013, WWW.

[23]  Kuo Zhang,et al.  Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[24]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[25]  Jeff Huang Web User Interaction Mining from Touch-Enabled Mobile Devices , 2012 .

[26]  Djoerd Hiemstra,et al.  Where to Go on Your Next Trip?: Optimizing Travel Destinations Based on User Preferences , 2015, SIGIR.

[27]  Jaap Kamps,et al.  The Impact of Technical Domain Expertise on Search Behavior and Task Outcome , 2015, ArXiv.

[28]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[29]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[30]  William Lewis,et al.  Skype Translator: Breaking down language and hearing barriers. A behind the scenes look at near real-time speech translation , 2015, TC.

[31]  Ron Kohavi,et al.  Seven rules of thumb for web site experimenters , 2014, KDD.

[32]  Jimeng Sun,et al.  Temporal recommendation on graphs via long- and short-term preference fusion , 2010, KDD.

[33]  Eugene Agichtein,et al.  Predicting web search success with fine-grained interaction data , 2012, CIKM.

[34]  Kerry Rodden,et al.  Eye-mouse coordination patterns on web search results pages , 2008, CHI Extended Abstracts.

[35]  Enhong Chen,et al.  Mining Mobile User Preferences for Personalized Context-Aware Recommendation , 2014, ACM Trans. Intell. Syst. Technol..

[36]  Eugene Agichtein,et al.  Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior , 2012, WWW.

[37]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[38]  Imed Zitouni,et al.  Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.

[39]  Gleb Gusev,et al.  Engagement Periodicity in Search Engine Usage: Analysis and its Application to Search Quality Evaluation , 2015, WSDM.

[40]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[42]  Eugene Agichtein,et al.  Ready to buy or just browsing?: detecting web searcher goals from interaction data , 2010, SIGIR.

[43]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[44]  Toon Calders,et al.  Predicting Current User Intent with Contextual Markov Models , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[45]  Madian Khabsa,et al.  Detecting Good Abandonment in Mobile Search , 2016, WWW.

[46]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[47]  Jaap Kamps,et al.  Behavioral Dynamics from the SERP's Perspective: What are Failed SERPs and How to Fix Them? , 2015, CIKM.

[48]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, ACM Conference on Recommender Systems.

[49]  Milad Shokouhi,et al.  Expected browsing utility for web search evaluation , 2010, CIKM.

[50]  Joanna Bergstrom-Lehtovirta,et al.  Modeling the functional area of the thumb on mobile touchscreen surfaces , 2014, CHI.

[51]  Vidhya Navalpakkam,et al.  Understanding Mobile Searcher Attention with Rich Ad Formats , 2016, CIKM.

[52]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[53]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[54]  Madian Khabsa,et al.  Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search , 2016, SIGIR.

[55]  Xia Wang,et al.  Context-aware prediction of user's first click , 2012, ContextDD '12.

[56]  Peter Robinson,et al.  Detecting Emotions from Connected Action Sequences , 2009, IVIC.

[57]  Chih-Hung Hsieh,et al.  Towards better measurement of attention and satisfaction in mobile search , 2014, SIGIR.

[58]  David Griol,et al.  A Neural Network Approach to Intention Modeling for User-Adapted Conversational Agents , 2015, Comput. Intell. Neurosci..

[59]  José Guilherme Camargo de Souza,et al.  Quality Estimation for Automatic Speech Recognition , 2014, COLING.

[60]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[61]  Djoerd Hiemstra,et al.  Beyond Movie Recommendations: Solving the Continuous Cold Start Problem in E-commerceRecommendations , 2016, ArXiv.

[62]  Imed Zitouni,et al.  Understanding User Satisfaction with Intelligent Assistants , 2016, CHIIR.

[63]  Morten L Kringelbach,et al.  Sensing Emotion in Voices: Negativity Bias and Gender Differences in a Validation Study of the Oxford Vocal (‘OxVoc’) Sounds Database , 2016, Psychological assessment.

[64]  Tong Lu,et al.  iSkin: Flexible, Stretchable and Visually Customizable On-Body Touch Sensors for Mobile Computing , 2015, CHI.

[65]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[66]  Daniel Bernhardt,et al.  Emotion inference from human body motion , 2010 .

[67]  Gökhan Tür,et al.  Extending boosting for large scale spoken language understanding , 2007, Machine Learning.

[68]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[69]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[70]  Eugene Agichtein,et al.  Inferring Searcher Attention by Jointly Modeling User Interactions and Content Salience , 2015, SIGIR.

[71]  Alex Deng,et al.  Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions , 2017, WSDM.