Bayesian Optimization for Optimizing Retrieval Systems

The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval models, to model parameters, such as k1 and b in BM25, or the number of query expansion terms. Grid and random search, the dominant methods to search for the optimal system configuration, lack a search strategy that can guide them in the hyperparameter space. This makes them inefficient and ineffective. In this paper, we propose to use Bayesian Optimization to jointly search and optimize over the hyperparameter space. Bayesian Optimization, a sequential decision making method, suggests the next most promising configuration to be tested on the basis of the retrieval effectiveness of configurations that have been examined so far. To demonstrate the efficiency and effectiveness of Bayesian Optimization we conduct experiments on TREC collections, and show that Bayesian Optimization outperforms manual tuning, grid search and random search, both in terms of retrieval effectiveness of the configuration found, and in terms of efficiency in finding this configuration.

[1]  Josiane Mothe,et al.  Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries , 2015, J. Univers. Comput. Sci..

[2]  Mark Sanderson,et al.  Examining Additivity and Weak Baselines , 2016, ACM Trans. Inf. Syst..

[3]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[4]  Nicola Ferro,et al.  A General Linear Mixed Models Approach to Study System Component Effects , 2016, SIGIR.

[5]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[6]  Matthew W. Hoffman Modular mechanisms for Bayesian optimization , 2014 .

[7]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[8]  Michalis Vazirgiannis,et al.  Composition of TF normalizations: new insights on scoring functions for ad hoc IR , 2013, SIGIR.

[9]  Iadh Ounis,et al.  Parameter sensitivity in the probabilistic model for ad-hoc retrieval , 2007, CIKM '07.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Bowen Zhou,et al.  Efficient Hyper-parameter Optimization for NLP Applications , 2015, EMNLP.

[12]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[13]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[14]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[15]  M. de Rijke,et al.  Pyndri: A Python Interface to the Indri Search Engine , 2017, ECIR.

[16]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[17]  Iadh Ounis,et al.  On setting the hyper-parameters of term frequency normalization for information retrieval , 2007, TOIS.

[18]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[19]  Josiane Mothe,et al.  Learning to Rank System Configurations , 2016, CIKM.

[20]  Dani Yogatama,et al.  Bayesian Optimization of Text Representations , 2015, EMNLP.

[21]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[22]  W. Bruce Croft,et al.  Unsupervised estimation of dirichlet smoothing parameters , 2010, SIGIR '10.

[23]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[24]  Alan J. Mayne,et al.  Towards Global Optimisation 2 , 1976 .

[25]  Andrew Trotman,et al.  Improvements to BM25 and Language Models Examined , 2014, ADCS.

[26]  Iadh Ounis,et al.  A study of parameter tuning for term frequency normalization , 2003, CIKM '03.

[27]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[28]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[29]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[30]  Michael A. Osborne,et al.  Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces , 2014, 1409.4011.

[31]  Michael A. Osborne,et al.  A Kernel for Hierarchical Parameter Spaces , 2013, ArXiv.

[32]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[33]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[34]  A. Owen,et al.  Valuation of mortgage-backed securities using Brownian bridges to reduce effective dimension , 1997 .

[35]  Éric Gaussier,et al.  Estimation of the Collection Parameter of Information Models for IR , 2013, ECIR.

[36]  Alistair Moffat,et al.  Improvements that don't add up: ad-hoc retrieval results since 1998 , 2009, CIKM.

[37]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .