User Modeling for a Personal Assistant

We present a user modeling system that serves as the foundation of a personal assistant. The system ingests web search history for signed-in users, and identifies coherent contexts that correspond to tasks, interests, and habits. Unlike past work which focused on either in-session tasks or tasks over a few days, we look at several months of history in order to identify not just short-term tasks, but also long-term interests and habits. The features we use for identifying coherent contexts yield substantially higher precision and recall than past work. We also present an algorithm for identifying contexts that is 8 to 30 times faster than previous algorithms. The user modeling system has been deployed in production. It runs over hundreds of millions of users, and updates the models with a 10-minute latency. The contexts identified by the system serve as the foundation for generating recommendations in Google Now.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Parag Agrawal,et al.  On indexing error-tolerant set containment , 2010, SIGMOD Conference.

[3]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[4]  Amanda Spink,et al.  Multitasking during Web search sessions , 2006, Inf. Process. Manag..

[5]  Fabrizio Silvestri,et al.  Discovering tasks from search engine query logs , 2013, TOIS.

[6]  Wei Chu,et al.  Learning to extract cross-session search tasks , 2013, WWW.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[9]  Hongbo Deng,et al.  Identifying and labeling search tasks via query-based hawkes processes , 2014, KDD.

[10]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[11]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[12]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[13]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[14]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[15]  Ryen W. White,et al.  Search, interrupted: understanding and predicting search task continuation , 2012, SIGIR '12.

[16]  Fabrizio Silvestri,et al.  Modeling and predicting the task-by-task behavior of search engine users , 2013, OAIR.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Haixun Wang,et al.  Identifying users' topical tasks in web search , 2013, WSDM.

[20]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[21]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.