论文信息 - Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $\Phi$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

[1] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[2] C. D. Perttunen,et al. Lipschitzian optimization without the Lipschitz constant , 1993 .

[3] Hung Chen. Lower Rate of Convergence for Locating a Maximum of a Function , 1988 .

[4] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.

[5] M. de Rijke,et al. Copeland Dueling Bandits , 2015, NIPS.

[6] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[7] Thorsten Joachims,et al. Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[8] Wataru Kumagai. Regret Analysis for Continuous Dueling Bandit , 2017, NIPS.

[9] Yang Yuan,et al. Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[10] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[11] L. Thurstone. A law of comparative judgment. , 1994 .

[12] Joel W. Burdick,et al. Multi-dueling Bandits with Dependent Arms , 2017, UAI.

[13] Shifeng Xiong,et al. Sequential Design and Analysis of High-Accuracy and Low-Accuracy Computer Codes , 2013, Technometrics.

[14] James Theiler,et al. Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[15] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.

[16] Felix A Faber,et al. Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[17] Kirthevasan Kandasamy,et al. Multi-Fidelity Black-Box Optimization with Hierarchical Partitions , 2018, ICML.

[18] Nicolas Vayatis,et al. A ranking approach to global optimization , 2016, ICML.

[19] Sivaraman Balakrishnan,et al. Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information , 2018, ICML.

[20] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[21] Kirthevasan Kandasamy,et al. Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[22] Nihar B. Shah,et al. Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[23] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[24] Sivaraman Balakrishnan,et al. Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.

[25] Kirthevasan Kandasamy,et al. Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[26] Omar Besbes,et al. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[27] Shachar Lovett,et al. Active Classification with Comparison Queries , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[28] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[29] Adam D. Bull,et al. Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[30] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[31] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[32] Yichong Xu,et al. Noise-Tolerant Interactive Learning from Pairwise Comparisons with Near-Minimal Label Complexity , 2017 .

[33] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.