论文信息 - Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Discrete structures play an important role in applications like program language modeling and software engineering. Current approaches to predicting complex structures typically consider autoregressive models for their tractability, with some sacrifice in flexibility. Energy-based models (EBMs) on the other hand offer a more flexible and thus more powerful approach to modeling such distributions, but require partition function estimation. In this paper we propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration, achieving a better trade-off between flexibility and tractability. Experimentally, we show that learning local search leads to significant improvements in challenging application domains. Most notably, we present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.

[1] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.

[2] Emiel Hoogeboom,et al. Integer Discrete Flows and Lossless Compression , 2019, NeurIPS.

[3] Razvan Pascanu,et al. Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[4] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[5] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[6] Omer Levy,et al. Constant-Time Machine Translation with Conditional Masked Language Models , 2019, IJCNLP 2019.

[7] Percy Liang,et al. Learning Fast-Mixing Models for Structured Prediction , 2015, ICML.

[8] Kumar Krishna Agrawal,et al. Discrete Flows: Invertible Generative Models of Discrete Data , 2019, DGS@ICLR.

[9] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[10] Yang Lu,et al. Cooperative Learning of Energy-Based Model and Latent Variable Model via MCMC Teaching , 2018, AAAI.

[11] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[12] Barton P. Miller,et al. An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[13] Igor Mordatch,et al. Implicit Generation and Generalization with Energy Based Models , 2018 .

[14] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[15] Le Song,et al. Kernel Exponential Family Estimation via Doubly Dual Embedding , 2018, AISTATS.

[16] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[17] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .

[18] Andrew McCallum,et al. Structured Prediction Energy Networks , 2015, ICML.

[19] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[20] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[22] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[24] Matt J. Kusner,et al. Grammar Variational Autoencoder , 2017, ICML.

[25] Christopher King,et al. The CERT Guide to Coordinated Vulnerability Disclosure , 2017 .

[26] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[27] Jakob Uszkoreit,et al. Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[28] Hyrum S. Anderson,et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[29] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[30] Weinan Zhang,et al. GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation , 2020, ICLR.

[31] Rishabh Singh,et al. Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[33] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[34] Pushmeet Kohli,et al. RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[35] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[36] Yuandong Tian,et al. Learning to Perform Local Rewriting for Combinatorial Optimization , 2019, NeurIPS.

[37] Rémi Munos,et al. Learning to Search with MCTSnets , 2018, ICML.

[38] Yang Lu,et al. Sparse and deep generalizations of the FRAME model , 2018 .

[39] Arthur Gretton,et al. KALE: When Energy-Based Learning Meets Adversarial Training , 2020, ArXiv.

[40] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[41] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[42] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[43] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.