Universal Approximation of Functions on Sets

Modelling functions of sets, or equivalently, permutation-invariant functions, is a longstanding challenge in machine learning. Deep Sets is a popular method which is known to be a universal approximator for continuous set functions. We provide a theoretical analysis of Deep Sets which shows that this universal approximation property is only guaranteed if the model’s latent space is sufficiently high-dimensional. If the latent space is even one dimension lower than necessary, there exist piecewise-affine functions for which Deep Sets performs no better than a näıve constant baseline, as judged by worst-case error. Deep Sets may be viewed as the most efficient incarnation of the Janossy pooling paradigm. We identify this paradigm as encompassing most currently popular set-learning methods. Based on this connection, we discuss the implications of our results for set learning more broadly, and identify some open questions on the universality of Janossy pooling in general.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Karol Borsuk Drei Sätze über die n-dimensionale euklidische Sphäre , 1933 .

[3]  Prateek Jain,et al.  Learning Functions over Sets via Permutation Adversarial Networks , 2019, ArXiv.

[4]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[5]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[6]  Prateek Yadav,et al.  HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification , 2018, ArXiv.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  Yaron Lipman,et al.  On Universal Equivariant Set Networks , 2020, ICLR.

[13]  Li Sun,et al.  End-to-end Recurrent Multi-Object Tracking and Trajectory Prediction with Relational Reasoning , 2019, ArXiv.

[14]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[15]  M. Stone The Generalized Weierstrass Approximation Theorem , 1948 .

[16]  Fabian B. Fuchs,et al.  SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.

[17]  Ryan L. Murphy,et al.  Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs , 2018, ICLR.

[18]  David W. Romero,et al.  Group Equivariant Stand-Alone Self-Attention For Vision , 2020, ICLR.

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[21]  Isil Dillig,et al.  LambdaNet: Probabilistic Type Inference using Graph Neural Networks , 2020, ICLR.

[22]  Olivier Teboul,et al.  Fast Differentiable Sorting and Ranking , 2020, ICML.

[23]  Shuiwang Ji,et al.  Non-Local Graph Neural Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[25]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[26]  Yingzhou Li,et al.  Universal approximation of symmetric and anti-symmetric functions , 2019, ArXiv.

[27]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.