SCOBO: Sparsity-Aware Comparison Oracle Based Optimization

We study derivative-free optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a comparison oracle, which, given two points $x$ and $y$, and returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$, with some probability of being incorrect. This probability may be constant or it may depend on $|f(x)-f(y)|$. Previous algorithms for this problem have been hampered by a query complexity which is polynomially dependent on the problem dimension, $d$. We propose a novel algorithm that breaks this dependence: it has query complexity only logarithmically dependent on $d$ if the function in addition has low dimensional structure that can be exploited. Numerical experiments on synthetic data and the MuJoCo dataset show that our algorithm outperforms state-of-the-art methods for comparison based optimization, and is even competitive with other derivative-free algorithms that require explicit function evaluations.

[1]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[2]  Adams Wei Yu,et al.  BLOCK-NORMALIZED GRADIENT METHOD: AN EMPIRICAL STUDY FOR TRAINING DEEP NEURAL NETWORK , 2018 .

[3]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[4]  Kfir Y. Levy,et al.  The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[5]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[6]  Xingyou Song,et al.  Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2020, ICLR.

[7]  Robert D. Nowak,et al.  Sparse Dueling Bandits , 2015, AISTATS.

[8]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[9]  N. Daw,et al.  Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[10]  Chris G. Knight,et al.  Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models , 2007, Proceedings of the National Academy of Sciences.

[11]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[14]  Wotao Yin,et al.  Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling , 2020, SIAM J. Optim..

[15]  Takafumi Kanamori,et al.  Parallel distributed block coordinate descent methods based on pairwise comparison oracle , 2014, J. Glob. Optim..

[16]  Cho-Jui Hsieh,et al.  Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[17]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[18]  A. Culyer Thurstone’s Law of Comparative Judgment , 2014 .

[19]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[20]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[21]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[22]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[23]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[24]  Sivaraman Balakrishnan,et al.  Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.

[25]  Olivier Sigaud,et al.  CEM-RL: Combining evolutionary and gradient-based methods for policy search , 2018, ICLR.

[26]  Shai Shalev-Shwartz,et al.  Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.

[27]  Coralia Cartis,et al.  A dimensionality reduction technique for unconstrained global optimization of functions with low effective dimensionality , 2020, Information and Inference: A Journal of the IMA.

[28]  Aaron D. Ames,et al.  Preference-Based Learning for Exoskeleton Gait Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[30]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..

[31]  L. Thurstone A law of comparative judgment. , 1994 .

[32]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[33]  Jaime G. Carbonell,et al.  Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training , 2017, ArXiv.

[34]  Krishnakumar Balasubramanian,et al.  Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.