Hyperbolic Deep Neural Networks: A Survey

Recently, hyperbolic deep neural networks (HDNNs) have been gaining momentum as the deep representations in the hyperbolic space provide high fidelity embeddings with few dimensions, especially for data possessing hierarchical structure. Such a hyperbolic neural architecture is quickly extended to many different scientific fields, including natural language processing, single-cell RNA-sequence analysis, graph embedding, financial analysis, and computer vision. The promising results demonstrate its superior capability, significant compactness of the model, and a substantially better physical interpretability than its counterpart in the Euclidean space. To stimulate future research, this paper presents a coherent and a comprehensive review of the literature around the neural components in the construction of HDNN, as well as the generalization of the leading deep approaches to the hyperbolic space. It also presents current applications of various tasks, together with insightful observations and identifying open questions and promising future directions.

[1]  市原 完治 Brownian motion on a Riemannian manifold , 1981 .

[2]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[3]  Sergey Ivanov,et al.  Are Hyperbolic Representations in Graphs Created Equal? , 2020, ArXiv.

[4]  John D. Lafferty,et al.  Hyperplane margin classifiers on the multinomial manifold , 2004, ICML.

[5]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[6]  Ittai Abraham,et al.  Reconstructing approximate tree metrics , 2007, PODC '07.

[7]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[8]  Shoichiro Yamaguchi,et al.  A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning , 2019, ICML.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Perturbational Analysis of Classical Scaling , 1979 .

[11]  A. Gilbert,et al.  Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding , 2020, Neural Information Processing Systems.

[12]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[13]  Robert P. W. Duin,et al.  Non-Euclidean Dissimilarities: Causes and Informativeness , 2010, SSPR/SPR.

[14]  F. Keil Semantic and Conceptual Development: An Ontological Perspective , 2014 .

[15]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[16]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[17]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .

[18]  Zhe Gan,et al.  APo-VAE: Text Generation in Hyperbolic Space , 2020, NAACL.

[19]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[20]  Andrew M. Dai,et al.  Embedding Text in Hyperbolic Spaces , 2018, TextGraphs@NAACL-HLT.

[21]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Blair D. Sullivan,et al.  Tree-Like Structure in Large Social and Information Networks , 2013, 2013 IEEE 13th International Conference on Data Mining.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[25]  Odilia Yim,et al.  Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data , 2015 .

[26]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[29]  Octavian-Eugen Ganea,et al.  Constant Curvature Graph Convolutional Networks , 2019, ICML.

[30]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[31]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[32]  Claire Mathieu,et al.  Hierarchical Clustering , 2017, SODA.

[33]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[34]  Jianfeng Gao,et al.  Implicit Deep Latent Variable Models for Text Generation , 2019, EMNLP.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Joshua B. Tenenbaum,et al.  Learning annotated hierarchies from relational data , 2006, NIPS.

[37]  Lorenzo Livi,et al.  Adversarial Autoencoders with Constant-Curvature Latent Manifolds , 2019, Appl. Soft Comput..

[38]  Maximilian Nickel,et al.  Riemannian Continuous Normalizing Flows , 2020, NeurIPS.

[39]  K. Cranmer,et al.  Flows for simultaneous manifold learning and density estimation , 2020, NeurIPS.

[40]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Qun Liu,et al.  HyperText: Endowing FastText with Hyperbolic Geometry , 2020, FINDINGS.

[42]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[43]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[44]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[45]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[46]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[47]  E. T. The Stereographic Projection , 1941, Nature.

[48]  Xiaopeng Hong,et al.  Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching , 2019, AAAI.

[49]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[50]  J. Stam,et al.  Ultrahyperbolic Representation Learning , 2020, NeurIPS.

[51]  R. Rockafellar Extension of Fenchel’ duality theorem for convex functions , 1966 .

[52]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[53]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[54]  Gao Cong,et al.  HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems , 2018, WSDM.

[55]  Chuan Zhou,et al.  Graph Geometry Interaction Learning , 2020, NeurIPS.

[56]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[57]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[58]  Shakir Mohamed,et al.  Normalizing Flows on Riemannian Manifolds , 2016, ArXiv.

[59]  Alexandros Kalousis,et al.  Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion , 2019, ESWC.

[60]  Andrew McCallum,et al.  Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space , 2019, KDD.

[61]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[62]  Le Song,et al.  Coupled Variational Bayes via Optimization Embedding , 2018, NeurIPS.

[63]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[64]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[65]  Anoop Cherian,et al.  On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[66]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[67]  Albert Gu,et al.  From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering , 2020, NeurIPS.

[68]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[69]  Yannick Berthoumieu,et al.  New Riemannian Priors on the Univariate Normal Model , 2014, Entropy.

[70]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[71]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[72]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[73]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[74]  Robust Large-Margin Learning in Hyperbolic Space , 2020, NeurIPS.

[75]  Jingang Shi,et al.  Mix Dimension in Poincaré Geometry for 3D Skeleton-based Action Recognition , 2020, ACM Multimedia.

[76]  David W. Miller,et al.  Lorentz Group Equivariant Neural Network for Particle Physics , 2020, ICML.

[77]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[78]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[79]  Rasul Karimov,et al.  Geoopt: Riemannian Optimization in PyTorch , 2020, ArXiv.

[80]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[81]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[82]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[83]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[84]  Deborah Estrin,et al.  Collaborative Metric Learning , 2017, WWW.

[85]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[86]  Kenji Yamanishi,et al.  Riemannian TransE: Multi-relational Graph Embedding in Non-Euclidean Space , 2018 .

[87]  V. Robins,et al.  Tiling the Euclidean and Hyperbolic planes with ribbons , 2019, 1904.03788.

[88]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[89]  Christopher De Sa,et al.  Differentiating through the Fr\'echet Mean , 2020 .

[90]  Seong-Hun Paeng Brownian motion on manifolds with time-dependent metrics and stochastic completeness , 2011 .

[91]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[92]  Gao Cong,et al.  Hyperbolic Recommender Systems , 2018, ArXiv.

[93]  I. Holopainen Riemannian Geometry , 1927, Nature.

[94]  A. Ungar Hyperbolic trigonometry and its application in the Poincaré ball model of hyperbolic geometry , 2001 .

[95]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[96]  Arnold Pizer On the arithmetic of quaternion algebras , 1976 .

[97]  Marc Peter Deisenroth,et al.  Neural Embeddings of Graphs in Hyperbolic Space , 2017, ArXiv.

[98]  Frederic Sala,et al.  Learning Mixed-Curvature Representations in Product Spaces , 2018, ICLR.

[99]  Douwe Kiela,et al.  Hyperbolic Graph Neural Networks , 2019, NeurIPS.

[100]  Ivan Ovinnikov,et al.  Poincar\'e Wasserstein Autoencoder , 2019, 1901.01427.

[101]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[102]  C. Gomez-Uribe,et al.  The Netflix Recommender System: Algorithms, Business Value, and Innovation , 2016, ACM Trans. Manag. Inf. Syst..

[103]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[104]  Renjie Liao,et al.  Latent Variable Modelling with Hyperbolic Normalizing Flows , 2020, ICML.

[105]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[106]  E. Beltrami Teoria fondamentale degli spazii di curvatura costante , 1868 .

[107]  Kaoru Katayama,et al.  Indexing Method for Hierarchical Graphs based on Relation among Interlacing Sequences of Eigenvalues , 2015, J. Inf. Process..

[108]  Jure Leskovec,et al.  Hyperbolic Graph Convolutional Neural Networks , 2019, NeurIPS.

[109]  Patrick Forré,et al.  Reparameterizing Distributions on Lie Groups , 2019, AISTATS.

[110]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[111]  W. Fischer,et al.  Sphere Packings, Lattices and Groups , 1990 .

[112]  Christopher De Sa,et al.  Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models , 2019, NeurIPS.

[113]  K. Mardia Statistics of Directional Data , 1972 .

[114]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[115]  Timothy M. Hospedales,et al.  Multi-relational Poincaré Graph Embeddings , 2019, NeurIPS.

[116]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[117]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[118]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[119]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[120]  Ming Wu,et al.  Learning Feature Interactions with Lorentzian Factorization Machine , 2019, AAAI.

[121]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[122]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[123]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[124]  Bernhard Schölkopf,et al.  On the Latent Space of Wasserstein Auto-Encoders , 2018, ArXiv.

[125]  A. O. Houcine On hyperbolic groups , 2006 .

[126]  Octavian-Eugen Ganea,et al.  Mixed-curvature Variational Autoencoders , 2019, ICLR.

[127]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[128]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[129]  Gregory Grefenstette,et al.  INRIASAC: Simple Hypernym Extraction Methods , 2015, *SEMEVAL.

[130]  W. Floyd,et al.  HYPERBOLIC GEOMETRY , 1996 .

[131]  Yee Whye Teh,et al.  Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders , 2019, NeurIPS.

[132]  Abraham Albert Ungar,et al.  A Gyrovector Space Approach to Hyperbolic Geometry , 2009, A Gyrovector Space Approach to Hyperbolic Geometry.

[133]  Robert Yuncken Regular tessellations of the hyperbolic plane by fundamental domains of a Fuchsian group , 2011, 1103.2051.

[134]  Gurtej Kanwar,et al.  Normalizing Flows on Tori and Spheres , 2020, ICML.

[135]  Fabio Daolio,et al.  Scalable Hyperbolic Recommender Systems , 2019, ArXiv.

[136]  Shyam Visweswaran,et al.  Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space , 2020, J. Chem. Inf. Model..

[137]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[138]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[139]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[140]  Bohua Zhan,et al.  Smooth Manifolds , 2021, Arch. Formal Proofs.

[141]  Valentin Khrulkov,et al.  Hyperbolic Image Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[142]  Renjie Liao,et al.  Lorentzian Distance Learning for Hyperbolic Representations , 2019, ICML.

[143]  Felix Hill,et al.  HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment , 2016, CL.

[144]  Marko Valentin Micic,et al.  Hyperbolic Deep Learning for Chinese Natural Language Understanding , 2018, ArXiv.

[145]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[146]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[147]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[148]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[149]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[150]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[151]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[152]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[153]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[154]  Shantanu Acharya,et al.  Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings , 2019, ACL.

[155]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[156]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[157]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[158]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[159]  Razvan Pascanu,et al.  Hyperbolic Attention Networks , 2018, ICLR.

[160]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[161]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[162]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[163]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[164]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[165]  Robert D. Kleinberg Geographic Routing Using Hyperbolic Space , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[166]  B. O'neill Semi-Riemannian Geometry With Applications to Relativity , 1983 .

[167]  H. Piaggio Differential Geometry of Curves and Surfaces , 1952, Nature.

[168]  Yanfang Ye,et al.  Hyperbolic Graph Attention Network , 2019, IEEE Transactions on Big Data.

[169]  Bonnie Berger,et al.  Large-Margin Classification in Hyperbolic Space , 2018, AISTATS.

[170]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[171]  Xavier Pennec,et al.  Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements , 2006, Journal of Mathematical Imaging and Vision.

[172]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.