论文信息 - SCOBO: Sparsity-Aware Comparison Oracle Based Optimization

SCOBO: Sparsity-Aware Comparison Oracle Based Optimization

We study derivative-free optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a comparison oracle, which, given two points $x$ and $y$, and returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$, with some probability of being incorrect. This probability may be constant or it may depend on $|f(x)-f(y)|$. Previous algorithms for this problem have been hampered by a query complexity which is polynomially dependent on the problem dimension, $d$. We propose a novel algorithm that breaks this dependence: it has query complexity only logarithmically dependent on $d$ if the function in addition has low dimensional structure that can be exploited. Numerical experiments on synthetic data and the MuJoCo dataset show that our algorithm outperforms state-of-the-art methods for comparison based optimization, and is even competitive with other derivative-free algorithms that require explicit function evaluations.

[1] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[2] Adams Wei Yu,et al. BLOCK-NORMALIZED GRADIENT METHOD: AN EMPIRICAL STUDY FOR TRAINING DEEP NEURAL NETWORK , 2018 .

[3] Richard G. Baraniuk,et al. 1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[4] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[5] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.

[6] Xingyou Song,et al. Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2020, ICLR.

[7] Robert D. Nowak,et al. Sparse Dueling Bandits , 2015, AISTATS.

[8] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[9] N. Daw,et al. Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[10] Chris G. Knight,et al. Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models , 2007, Proceedings of the National Academy of Sciences.

[11] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Nando de Freitas,et al. Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[14] Wotao Yin,et al. Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling , 2020, SIAM J. Optim..

[15] Takafumi Kanamori,et al. Parallel distributed block coordinate descent methods based on pairwise comparison oracle , 2014, J. Glob. Optim..

[16] Cho-Jui Hsieh,et al. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[17] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[18] A. Culyer. Thurstone’s Law of Comparative Judgment , 2014 .

[19] James C. Spall,et al. Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[20] Yaniv Plan,et al. Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[21] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[22] S. Li. Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[23] Kevin Leyton-Brown,et al. An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[24] Sivaraman Balakrishnan,et al. Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.