Domain Adaptation for Enterprise Email Search

In the enterprise email search setting, the same search engine often powers multiple enterprises from various industries: technology, education, manufacturing, etc. However, using the same global ranking model across different enterprises may result in suboptimal search quality, due to the corpora differences and distinct information needs. On the other hand, training an individual ranking model for each enterprise may be infeasible, especially for smaller institutions with limited data. To address this data challenge, in this paper we propose a domain adaptation approach that fine-tunes the global model to each individual enterprise. In particular, we propose a novel application of the Maximum Mean Discrepancy (MMD) approach to information retrieval, which attempts to bridge the gap between the global data distribution and the data distribution for a given individual enterprise. We conduct a comprehensive set of experiments on a large-scale email search engine, and demonstrate that the MMD approach consistently improves the search quality for multiple individual domains, both in comparison to the global ranking model, as well as several competitive domain adaptation baselines including adversarial learning methods.

[1]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Filip Radlinski,et al.  Understanding and Modeling Success in Email Search , 2017, SIGIR.

[4]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[5]  Eric Gilbert,et al.  Overload is overloaded: email in the age of Gmail , 2014, CHI.

[6]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[7]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Zhen Qin,et al.  Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering , 2018, CIKM.

[9]  M. White Enterprise Search , 2012 .

[10]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.

[11]  Marc Najork,et al.  Learning to Rank with Selection Bias in Personal Search , 2016, SIGIR.

[12]  Korris Fu-Lai Chung,et al.  Deep Domain Adaptation Based on Multi-layer Joint Kernelized Distance , 2018, SIGIR.

[13]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[14]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[15]  Michael Bendersky,et al.  Multi-Task Learning for Personal Search Ranking with Query Clustering , 2018 .

[16]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Udo Kruschwitz,et al.  Searching the Enterprise , 2017, Found. Trends Inf. Retr..

[18]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[20]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[21]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[22]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[23]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Hamed Zamani,et al.  Situational Context for Ranking in Personal Search , 2017, WWW.

[26]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[27]  Susan T. Dumais,et al.  Stuff I've Seen: A System for Personal Information Retrieval and Re-Use , 2003, SIGF.

[28]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[29]  Bhaskar Mitra,et al.  Cross Domain Regularization for Neural Ranking Models using Adversarial Learning , 2018, SIGIR.

[30]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[31]  Mengjie Zhang,et al.  Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[32]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[33]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[34]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[35]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[36]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[37]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[40]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[41]  Susan T. Dumais,et al.  Characterizing Email Search using Large-scale Behavioral Logs and Surveys , 2017, WWW.

[42]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[43]  Tao Mei,et al.  Deep Domain Adaptation Hashing with Adversarial Learning , 2018, SIGIR.

[44]  Amin Mantrach,et al.  Deep Character-Level Click-Through Rate Prediction for Sponsored Search , 2017, SIGIR.

[45]  David Carmel,et al.  Rank by Time or by Relevance?: Revisiting Email Search , 2015, CIKM.

[46]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[47]  Sebastian Bruch,et al.  TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank , 2018, KDD.