论文信息 - Poincaré Embeddings for Learning Hierarchical Representations

Poincaré Embeddings for Learning Hierarchical Representations

Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincare ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincare embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.

Douwe Kiela | Maximilian Nickel | Douwe Kiela | Maximilian Nickel

[1] George Kingsley Zipf,et al. Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[2] Yuen Ren Chao,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[3] Zellig S. Harris,et al. Distributional Structure , 1954 .

[4] J. R. Firth,et al. A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[5] Zellig S. Harris,et al. Distributional Structure , 1954 .

[6] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[7] Geoffrey E. Hinton,et al. Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[8] Peter D. Hoff,et al. Latent Space Approaches to Social Network Analysis , 2002 .

[9] Albert-László Barabási,et al. Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10] Joshua B. Tenenbaum,et al. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[11] Helge J. Ritter,et al. Large-scale data exploration with the hierarchically growing hyperbolic SOM , 2006, Neural Networks.

[12] A. O. Houcine. On hyperbolic groups , 2006 .

[13] Robert D. Kleinberg. Geographic Routing Using Hyperbolic Space , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[14] M. Newman,et al. Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[15] Marián Boguñá,et al. Sustaining the Internet with Hyperbolic Mapping , 2010, Nature communications.

[16] Amin Vahdat,et al. Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[18] Hans-Peter Kriegel,et al. A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[19] Blair D. Sullivan,et al. Tree-Like Structure in Large Social and Information Networks , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21] Andrew McCallum,et al. Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[22] Silvere Bonnabel,et al. Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[23] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24] Xueyan Jiang,et al. Reducing the Rank in Relational Factorization Models by Including Observable Patterns , 2014, NIPS.

[25] David J. Weir,et al. Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.