Poincaré Embeddings for Learning Hierarchical Representations

Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincare ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincare embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.

[1]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[2]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[5]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[6]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[9]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[11]  Helge J. Ritter,et al.  Large-scale data exploration with the hierarchically growing hyperbolic SOM , 2006, Neural Networks.

[12]  A. O. Houcine On hyperbolic groups , 2006 .

[13]  Robert D. Kleinberg Geographic Routing Using Hyperbolic Space , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[14]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[15]  Marián Boguñá,et al.  Sustaining the Internet with Hyperbolic Mapping , 2010, Nature communications.

[16]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[18]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[19]  Blair D. Sullivan,et al.  Tree-Like Structure in Large Social and Information Networks , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[22]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Xueyan Jiang,et al.  Reducing the Rank in Relational Factorization Models by Including Observable Patterns , 2014, NIPS.

[25]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[28]  Guillaume Bouchard,et al.  On Approximate Reasoning Capabilities of Low-Rank Vector Spaces , 2015, AAAI Spring Symposia.

[29]  Ke Sun,et al.  Space-Time Local Embeddings , 2015, NIPS.

[30]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[31]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[32]  Cosma Rohilla Shalizi,et al.  Geometric Network Comparisons , 2015, UAI.

[33]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[34]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[35]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[36]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[37]  Thomas Demeester,et al.  Lifted Rule Injection for Relation Embeddings , 2016, EMNLP.

[38]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[39]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[40]  Felix Hill,et al.  HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment , 2016, CL.

[41]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.