Zeroth Order Non-convex optimization with Dueling-Choice Bandits

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $\Phi$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

[1]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[2]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[3]  Hung Chen Lower Rate of Convergence for Locating a Maximum of a Function , 1988 .

[4]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[5]  M. de Rijke,et al.  Copeland Dueling Bandits , 2015, NIPS.

[6]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[7]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[8]  Wataru Kumagai Regret Analysis for Continuous Dueling Bandit , 2017, NIPS.

[9]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[10]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[11]  L. Thurstone A law of comparative judgment. , 1994 .

[12]  Joel W. Burdick,et al.  Multi-dueling Bandits with Dependent Arms , 2017, UAI.

[13]  Shifeng Xiong,et al.  Sequential Design and Analysis of High-Accuracy and Low-Accuracy Computer Codes , 2013, Technometrics.

[14]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[15]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[16]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[17]  Kirthevasan Kandasamy,et al.  Multi-Fidelity Black-Box Optimization with Hierarchical Partitions , 2018, ICML.

[18]  Nicolas Vayatis,et al.  A ranking approach to global optimization , 2016, ICML.

[19]  Sivaraman Balakrishnan,et al.  Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information , 2018, ICML.

[20]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[21]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[22]  Nihar B. Shah,et al.  Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[23]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[24]  Sivaraman Balakrishnan,et al.  Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.

[25]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[26]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[27]  Shachar Lovett,et al.  Active Classification with Comparison Queries , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[29]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[30]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[31]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[32]  Yichong Xu,et al.  Noise-Tolerant Interactive Learning from Pairwise Comparisons with Near-Minimal Label Complexity , 2017 .

[33]  Yair Carmon,et al.  "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.