Learning to Optimize in Swarms

Learning to optimize has emerged as a powerful framework for various optimization and machine learning tasks. Current such "meta-optimizers" often learn in the space of continuous optimization algorithms that are point-based and uncertainty-unaware. To overcome the limitations, we propose a meta-optimizer that learns in the algorithmic space of both point-based and population-based optimization algorithms. The meta-optimizer targets at a meta-loss function consisting of both cumulative regret and entropy. Specifically, we learn and interpret the update formula through a population of LSTMs embedded with sample- and feature-level attentions. Meanwhile, we estimate the posterior directly over the global optimum and use an uncertainty measure to help guide the learning process. Empirical results over non-convex test functions and the protein-docking application demonstrate that this new meta-optimizer outperforms existing competitors. The codes are publicly available at: https://github.com/Shen-Lab/LOIS.

[1]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[3]  J. Chilès,et al.  Geostatistics: Modeling Spatial Uncertainty , 1999 .

[4]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[5]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[6]  Zhangyang Wang,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[7]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.

[8]  H. Robbins A Stochastic Approximation Method , 1951 .

[9]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[10]  Xiaohan Chen,et al.  Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[11]  Lewis B. Ward Reminiscence and rote learning. , 1937 .

[12]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[13]  Yang Yang,et al.  ABD-Net: Attentive but Diverse Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Zhiping Weng,et al.  ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers , 2014, Bioinform..

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Xiaohan Chen,et al.  ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA , 2018, ICLR.

[17]  Samy Bengio,et al.  On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[18]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[19]  G. Evans,et al.  Learning to Optimize , 2008 .

[20]  Yue Cao,et al.  Bayesian active learning for optimization and uncertainty quantification in protein docking , 2019, bioRxiv.

[21]  Zhiping Weng,et al.  A protein–protein docking benchmark , 2003, Proteins.

[22]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[23]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[26]  Ali M. Mosammam,et al.  Geostatistics: modeling spatial uncertainty, second edition , 2013 .

[27]  Qing Ling,et al.  Learning Deep $\ell_0$ Encoders , 2015, 1509.00153.

[28]  Xiaohan Chen,et al.  Plug-and-Play Methods Provably Converge with Properly Trained Denoisers , 2019, ICML.

[29]  P. Bates,et al.  SwarmDock and the Use of Normal Modes in Protein-Protein Docking , 2010, International journal of molecular sciences.

[30]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[31]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.