HATS: A Hierarchical Sequence-Attention Framework for Inductive Set-of-Sets Embeddings

In many complex domains, the input data are often not suited for the typical vector representations used in deep learning models. For example, in relational learning and computer vision tasks, the data are often better represented as sets (e.g., the neighborhood of a node, a cloud of points). In these cases, a key challenge is to learn an embedding function that is invariant to permutations of the input. While there has been some recent work on principled methods for learning permutation-invariant representations of sets, these approaches are limited in their applicability to set-of-sets (SoS) tasks, such as subgraph prediction and scene classification. In this work, we develop a deep neural network framework to learn inductive SoS embeddings that are invariant to SoS permutations. Specifically, we propose HATS, a hierarchical sequence model with attention mechanisms for inductive set-of-sets embeddings. We develop stochastic optimization and inference methods for learning HATS, and our experiments demonstrate that HATS achieves superior performance across a wide range of set-of-sets tasks.

[1]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[2]  Dong Li,et al.  Link prediction in social networks based on hypergraph , 2013, WWW.

[3]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Sanjit A. Seshia,et al.  Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning , 2018, ArXiv.

[6]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[7]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[10]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[11]  Yee Whye Teh,et al.  Probabilistic symmetry and invariant neural networks , 2019, J. Mach. Learn. Res..

[12]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[13]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[14]  Maya R. Gupta,et al.  Deep Lattice Networks and Partial Monotonic Functions , 2017, NIPS.

[15]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[16]  Ryan L. Murphy,et al.  Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs , 2018, ICLR.

[17]  Kevin Leyton-Brown,et al.  Deep Models of Interactions Across Sets , 2018, ICML.

[18]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[21]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[22]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[23]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[24]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[25]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[26]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[27]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[28]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[29]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[30]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[31]  Daryl J. Daley,et al.  An Introduction to the Theory of Point Processes , 2013 .

[32]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[33]  Jennifer Neville,et al.  Deep Collective Inference , 2017, AAAI.

[34]  Ulrike Goldschmidt,et al.  An Introduction To The Theory Of Point Processes , 2016 .

[35]  Bruno Ribeiro,et al.  Subgraph Pattern Neural Networks for High-Order Graph Evolution Prediction , 2018, AAAI.

[36]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[37]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[38]  Maya R. Gupta,et al.  Interpretable Set Functions , 2018, ArXiv.