GFlowNet-EM for learning compositional latent variable models

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.

[1]  Y. Bengio,et al.  Better Training of GFlowNets with Local Credit and Incomplete Trajectories , 2023, ICML.

[2]  Y. Bengio,et al.  A theory of continuous generative flow networks , 2023, ICML.

[3]  David W. Zhang,et al.  Robust Scheduling with GFlowNets , 2023, ICLR.

[4]  J. Cunningham,et al.  Posterior Collapse and Latent Variable Non-identifiability , 2023, NeurIPS.

[5]  Y. Bengio,et al.  Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes , 2022, ArXiv.

[6]  Bonaventure F. P. Dossou,et al.  GFlowOut: Dropout with Generative Flow Networks , 2022, ICML.

[7]  C. A. Naesseth,et al.  A Variational Perspective on Generative Flow Networks , 2022, Trans. Mach. Learn. Res..

[8]  Edward J. Hu,et al.  GFlowNets and variational inference , 2022, ICLR.

[9]  Emmanuel Bengio,et al.  Learning GFlowNets from partial episodes for improved convergence and stability , 2022, ICML.

[10]  Ricky T. Q. Chen,et al.  Unifying Generative Models with GFlowNets , 2022, ArXiv.

[11]  Bonaventure F. P. Dossou,et al.  Biological Sequence Design with GFlowNets , 2022, ICML.

[12]  Chris C. Emezue,et al.  Bayesian Structure Learning with Generative Flow Networks , 2022, UAI.

[13]  Aaron C. Courville,et al.  Generative Flow Networks for Discrete Probabilistic Modeling , 2022, ICML.

[14]  Chen Sun,et al.  Trajectory Balance: Improved Credit Assignment in GFlowNets , 2022, NeurIPS.

[15]  Justin Domke,et al.  Amortized Variational Inference for Simple Hierarchical Models , 2021, NeurIPS.

[16]  Martin Rohrmeier,et al.  Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks , 2021, NeurIPS.

[17]  Anna Maria Di Sciullo On Aspects of the Theory of Syntax , 2021, Inference: International Review of Science.

[18]  Doina Precup,et al.  Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation , 2021, NeurIPS.

[19]  Kewei Tu,et al.  PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols , 2021, NAACL.

[20]  Ivan Titov,et al.  An Empirical Study of Compound PCFGs , 2021, ADAPTNLP.

[21]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[22]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yoshua Bengio,et al.  Inductive biases for deep learning of higher-level cognition , 2020, Proceedings of the Royal Society A.

[24]  Joshua B. Tenenbaum,et al.  Learning to learn generative programs with Memoised Wake-Sleep , 2020, UAI.

[25]  Abdel-rahman Mohamed,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[26]  Ilya Sutskever,et al.  Jukebox: A Generative Model for Music , 2020, ArXiv.

[27]  N. Jojic,et al.  Mining self-similarity: Label super-resolution with epitomic representations , 2020, ECCV.

[28]  Alexander M. Rush Torch-Struct: Deep Structured Prediction Library , 2020, ACL.

[29]  David M. Blei,et al.  Topic Modeling in Embedding Spaces , 2019, Transactions of the Association for Computational Linguistics.

[30]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[31]  Yee Whye Teh,et al.  Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow , 2018, UAI.

[32]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[33]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[36]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[37]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[40]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[41]  Ronald C. Neath,et al.  On Convergence Properties of the Monte Carlo EM Algorithm , 2012, 1206.4768.

[42]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[43]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[44]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[45]  Brendan J. Frey,et al.  A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[47]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[48]  J. Baker Trainable grammars for speech recognition , 1979 .

[49]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[50]  William Schuler,et al.  Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages , 2021, EMNLP.

[51]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[52]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[53]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.