Transformer-Based Learned Optimization

In this paper, we propose a new approach to learned optimization. As common in the literature, we represent the computation of the update step of the optimizer with a neural network. The parameters of the optimizer are then learned on a set of training optimization tasks, in order to perform minimisation efficiently. Our main innovation is to propose a new neural network architecture for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization approaches [28, 31], our formulation allows for conditioning across different dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without re-training. We demonstrate the advantages of our approach on a benchmark composed of objective functions tradition-ally used for evaluation of optimization algorithms, as well as on the real world-task of physics-based reconstruction of articulated 3D human motion.

[1]  M. Andriluka,et al.  Differentiable Dynamics for Articulated 3d Human Motion Reconstruction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luke Metz,et al.  Practical tradeoffs between memory, compute, and performance in learned optimizers , 2022, CoLLAs.

[3]  Brandon Amos Tutorial on amortized optimization for learning to optimize over continuous domains , 2022, ArXiv.

[4]  Paul Vicol,et al.  Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies , 2021, ICML.

[5]  Samuel S. Schoenholz,et al.  Gradients are Not All You Need , 2021, ArXiv.

[6]  S. Fidler,et al.  Physics-based Human Motion Estimation and Synthesis from Videos , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Samuel S. Schoenholz,et al.  Learn2Hop: Learned Optimization on Rough Landscapes , 2021, ICML.

[8]  Christian Theobalt,et al.  Neural monocular 3D human motion capture with physical awareness , 2021, ACM Trans. Graph..

[9]  Kris Kitani,et al.  SimPoE: Simulated Character Control for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  W. Yin,et al.  Learning to Optimize: A Primer and A Benchmark , 2021, J. Mach. Learn. Res..

[11]  G. Sukhatme,et al.  NeuralSim: Augmenting Differentiable Simulators with Neural Networks , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[12]  J. Sohl-Dickstein,et al.  Reverse engineering learned optimizers reveals known and novel mechanisms , 2020, NeurIPS.

[13]  Eduard Gabriel Bazavan,et al.  Neural Descent for Visual 3D Human Pose and Shape , 2020, Computer Vision and Pattern Recognition.

[14]  Jascha Sohl-Dickstein,et al.  Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves , 2020, ArXiv.

[15]  Christian Theobalt,et al.  PhysCap , 2020, ACM Trans. Graph..

[16]  Jie Song,et al.  Human Body Model Fitting by Learned Gradient Descent , 2020, ECCV.

[17]  Leonidas J. Guibas,et al.  Contact and Human Dynamics from Monocular Video , 2020, SCA.

[18]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Cristian Sminchisescu,et al.  Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows , 2020, ECCV.

[20]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[22]  Nicolas Mansard,et al.  Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[24]  Richard S. Zemel,et al.  Aggregated Momentum: Stability Through Passive Damping , 2018, ICLR.

[25]  Satoru Fukayama,et al.  AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing , 2019, ISMIR.

[26]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[27]  Renjie Liao,et al.  Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.

[28]  Warren Hare,et al.  Best practices for comparing optimization algorithms , 2017, Optimization and Engineering.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[31]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[32]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  G. Evans,et al.  Learning to Optimize , 2008 .

[37]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[38]  Rafael Martí,et al.  Experimental Testing of Advanced Scatter Search Designs for Global Optimization of Multimodal Functions , 2005, J. Glob. Optim..

[39]  Zelda B. Zabinsky,et al.  A Numerical Evaluation of Several Stochastic Algorithms on Selected Continuous Global Optimization Test Problems , 2005, J. Glob. Optim..

[40]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[41]  Y. D. Sergeyev,et al.  Global Optimization with Non-Convex Constraints - Sequential and Parallel Algorithms (Nonconvex Optimization and its Applications Volume 45) (Nonconvex Optimization and Its Applications) , 2000 .

[42]  J. Doye,et al.  Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms , 1997, cond-mat/9803344.

[43]  Heinz Mühlenbein,et al.  The parallel genetic algorithm as function optimizer , 1991, Parallel Comput..

[44]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[45]  R. Fletcher Practical Methods of Optimization , 1988 .

[46]  D. Ackley A connectionist machine for genetic hillclimbing , 1987 .

[47]  H. H. Rosenbrock,et al.  An Automatic Method for Finding the Greatest or Least Value of a Function , 1960, Comput. J..