GFlowNet Foundations

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions. 1 ar X iv :2 11 1. 09 26 6v 1 [ cs .L G ] 1 7 N ov 2 02 1

[1]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[2]  Johann Gasteiger,et al.  A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules , 2004, J. Chem. Inf. Model..

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[5]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[6]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[7]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[8]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[9]  Ryan P. Adams,et al.  Discrete Object Generation with Reversible Inductive Construction , 2019, NeurIPS.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Stéphane Doncieux,et al.  Encouraging Behavioral Diversity in Evolutionary Robotics: An Empirical Study , 2012, Evolutionary Computation.

[12]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[13]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[14]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[15]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[16]  Jos'e Miguel Hern'andez-Lobato,et al.  Constrained Bayesian Optimization for Automatic Chemical Design , 2017 .

[17]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[18]  Nicolas Heess,et al.  Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions , 2019, AISTATS.

[19]  Bo Dai,et al.  DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.

[20]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Alexey Dosovitskiy,et al.  You Only Train Once: Loss-Conditional Training of Deep Networks , 2020, ICLR.

[23]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[24]  Juergen Schmidhuber,et al.  Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.

[25]  Rishabh Singh,et al.  Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration , 2020, NeurIPS.

[26]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[27]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[28]  C. Holmes,et al.  A framework for adaptive MCMC targeting multimodal distributions , 2018, The Annals of Statistics.

[29]  Bo Dai,et al.  Batch Stationary Distribution Estimation , 2020, ICML.

[30]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[31]  David Dohan,et al.  Amortized Bayesian Optimization over Discrete Spaces , 2020, UAI.

[32]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[33]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[34]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[35]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[36]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[37]  Doina Precup,et al.  Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation , 2021, NeurIPS.

[38]  Weinan Zhang,et al.  MARS: Markov Molecular Sampling for Multi-objective Drug Discovery , 2021, ICLR.

[39]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[40]  David Duvenaud,et al.  Oops I Took A Gradient: Scalable Sampling for Discrete Distributions , 2021, ICML.

[41]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[42]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Charlie Nash,et al.  Autoregressive Energy Machines , 2019, ICML.

[45]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[46]  Bernhard Schölkopf,et al.  Recurrent Independent Mechanisms , 2021, ICLR.

[47]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[48]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[49]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[50]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[51]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[52]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[53]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.