On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations.

[1]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[2]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[3]  Neri Merhav,et al.  A strong version of the redundancy-capacity theorem of universal coding , 1995, IEEE Trans. Inf. Theory.

[4]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[5]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[6]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[7]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[8]  Neri Merhav,et al.  Hierarchical universal coding , 1996, IEEE Trans. Inf. Theory.

[9]  Neil D. Lawrence,et al.  Preferential Bayesian Optimization , 2017, ICML.

[10]  G. Keidser,et al.  The NAL-NL2 Prescription Procedure , 2011, Audiology research.

[11]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[12]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[13]  Shigeichi Hirasawa,et al.  A class of distortionless codes designed by Bayes decision theory , 1991, IEEE Trans. Inf. Theory.

[14]  M.G.H. Cox,et al.  A parametric approach to Bayesian optimization with pairwise comparisons , 2017, NIPS 2017.

[15]  L. Jones Measurement of Values , 1959, Nature.

[16]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[17]  Ole Winther,et al.  Optimal perceptron learning: as online Bayesian approach , 1999 .

[18]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..