Learning from User Interactions in Personal Search via Attribute Parameterization

User interaction data (e.g., click data) has proven to be a powerful signal for learning-to-rank models in web search. However, such models require observing multiple interactions across many users for the same query-document pair to achieve statistically meaningful gains. Therefore, utilizing user interaction data for improving search over personal, rather than public, content is a challenging problem. First, the documents (e.g., emails or private files) are not shared across users. Second, user search queries are of personal nature (e.g., "alice's address") and may not generalize well across users. In this paper, we propose a solution to these challenges, by projecting user queries and documents into a multi-dimensional space of fine-grained and semantically coherent attributes. We then introduce a novel parameterization technique to overcome sparsity in the multi-dimensional attribute space. Attribute parameterization enables effective usage of cross-user interactions for improving personal search quality -- which is a first such published result, to the best of our knowledge. Experiments with a dataset derived from interactions of users of one of the world's largest personal search engines demonstrate the effectiveness of the proposed attribute parameterization technique.

[1]  Eugene Agichtein,et al.  Mining touch interaction data on mobile devices to predict web search result relevance , 2013, SIGIR.

[2]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[3]  Marc-Allen Cartright,et al.  Hierarchical Label Propagation and Discovery for Machine Generated Email , 2016, WSDM.

[4]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[5]  Nir Ailon,et al.  Threading machine generated email , 2013, WSDM '13.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Yoelle Maarek,et al.  How Many Folders Do You Really Need?: Classifying Email into a Handful of Categories , 2014, CIKM.

[8]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  References , 1971 .

[10]  Jiejun Xu,et al.  Multimodal photo annotation and retrieval on a mobile phone , 2008, MIR '08.

[11]  Eugene Agichtein,et al.  Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior , 2012, WWW.

[12]  Ya Xu,et al.  Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices , 2009, WWW '09.

[13]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[14]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Susan T. Dumais,et al.  Stuff I've Seen: A System for Personal Information Retrieval and Re-Use , 2003, SIGF.

[17]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[18]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[19]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[20]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[21]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  Jianfeng Gao,et al.  Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[24]  David Carmel,et al.  Rank by Time or by Relevance?: Revisiting Email Search , 2015, CIKM.

[25]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[26]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[27]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[28]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[29]  Kenneth Ward Church,et al.  Priors in Web Search , 2009 .

[30]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.