Conditional Set Generation with Transformers

A set is an unordered collection of unique elements--and yet many machine learning models that generate sets impose an implicit or explicit ordering. Since model performance can depend on the choice of order, any particular ordering can lead to sub-optimal results. An alternative solution is to use a permutation-equivariant set generator, which does not specify an order-ing. An example of such a generator is the DeepSet Prediction Network (DSPN). We introduce the Transformer Set Prediction Network (TSPN), a flexible permutation-equivariant model for set prediction based on the transformer, that builds upon and outperforms DSPN in the quality of predicted set elements and in the accuracy of their predicted sizes. We test our model on MNIST-as-point-clouds (SET-MNIST) for point-cloud generation and on CLEVR for object detection.

[1]  M. Bayati,et al.  Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality , 2008, IEEE Transactions on Information Theory.

[2]  Yee Whye Teh,et al.  Neural network models of exchangeable sequences , 2018 .

[3]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[4]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[5]  Tie-Yan Liu,et al.  On Layer Normalization in the Transformer Architecture , 2020, ICML.

[6]  Georg Heigold,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[7]  Frank Noé,et al.  Machine learning for molecular simulation , 2019, Annual review of physical chemistry.

[8]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[9]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[10]  Jonathon S. Hare,et al.  FSPool: Learning Set Representations with Featurewise Sort Pooling , 2019, ICLR.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[14]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[16]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[17]  Jonathon S. Hare,et al.  Deep Set Prediction Networks , 2019, NeurIPS.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Eric P. Xing,et al.  The Dependent Dirichlet Process Mixture of Objects for Detection-free Tracking and Object Modeling , 2014, AISTATS.

[20]  Yee Whye Teh,et al.  Probabilistic symmetry and invariant neural networks , 2019, J. Mach. Learn. Res..

[21]  Daniel Cremers,et al.  Deep Perm-Set Net: Learn to predict sets with unknown permutation and cardinality using deep neural networks , 2018, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[23]  Kevin Murphy,et al.  Towards Differentiable Resampling , 2020, ArXiv.

[24]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.