论文信息 - Learning to Optimize in Swarms

Learning to Optimize in Swarms

Learning to optimize has emerged as a powerful framework for various optimization and machine learning tasks. Current such "meta-optimizers" often learn in the space of continuous optimization algorithms that are point-based and uncertainty-unaware. To overcome the limitations, we propose a meta-optimizer that learns in the algorithmic space of both point-based and population-based optimization algorithms. The meta-optimizer targets at a meta-loss function consisting of both cumulative regret and entropy. Specifically, we learn and interpret the update formula through a population of LSTMs embedded with sample- and feature-level attentions. Meanwhile, we estimate the posterior directly over the global optimum and use an uncertainty measure to help guide the learning process. Empirical results over non-convex test functions and the protein-docking application demonstrate that this new meta-optimizer outperforms existing competitors. The codes are publicly available at: https://github.com/Shen-Lab/LOIS.

[1] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2] M. Sternberg,et al. Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[3] J. Chilès,et al. Geostatistics: Modeling Spatial Uncertainty , 1999 .

[4] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.

[5] Xiaohan Chen,et al. Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[6] Zhangyang Wang,et al. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[7] H. Harlow,et al. The formation of learning sets. , 1949, Psychological review.

[8] H. Robbins. A Stochastic Approximation Method , 1951 .

[9] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[10] Xiaohan Chen,et al. Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[11] Lewis B. Ward. Reminiscence and rote learning. , 1937 .

[12] Xin-She Yang,et al. Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[13] Yang Yang,et al. ABD-Net: Attentive but Diverse Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Zhiping Weng,et al. ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers , 2014, Bioinform..

[15] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16] Xiaohan Chen,et al. ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA , 2018, ICLR.

[17] Samy Bengio,et al. On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[18] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[19] G. Evans,et al. Learning to Optimize , 2008 .

[20] Yue Cao,et al. Bayesian active learning for optimization and uncertainty quantification in protein docking , 2019, bioRxiv.

[21] Zhiping Weng,et al. A protein–protein docking benchmark , 2003, Proteins.

[22] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.

[23] Jeffrey J. Gray,et al. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[26] Ali M. Mosammam,et al. Geostatistics: modeling spatial uncertainty, second edition , 2013 .

[27] Qing Ling,et al. Learning Deep $\ell_0$ Encoders , 2015, 1509.00153.

[28] Xiaohan Chen,et al. Plug-and-Play Methods Provably Converge with Properly Trained Denoisers , 2019, ICML.

[29] P. Bates,et al. SwarmDock and the Use of Normal Modes in Protein-Protein Docking , 2010, International journal of molecular sciences.

[30] P. Aloy,et al. Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[31] Misha Denil,et al. Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.