Differentiating through the Fréchet Mean

Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold. One possible extension is the Frechet mean, the generalization of the Euclidean mean; however, it has been difficult to apply because it lacks a closed form with an easily computable derivative. In this paper, we show how to differentiate through the Frechet mean for arbitrary Riemannian manifolds. Then, focusing on hyperbolic space, we derive explicit gradient expressions and a fast, accurate, and hyperparameter-free Frechet mean solver. This fully integrates the Frechet mean into the hyperbolic neural network pipeline. To demonstrate this integration, we present two case studies. First, we apply our Frechet mean to the existing Hyperbolic Graph Convolutional Network, replacing its projected aggregation to obtain state-of-the-art results on datasets with high hyperbolicity. Second, to demonstrate the Frechet mean's capacity to generalize Euclidean neural network operations, we develop a hyperbolic batch normalization method that gives an improvement parallel to the one observed in the Euclidean setting.

[1]  G. Martius,et al.  Differentiation of Blackbox Combinatorial Solvers , 2019, ICLR.

[2]  Jure Leskovec,et al.  Hyperbolic Graph Convolutional Neural Networks , 2019, NeurIPS.

[3]  Douwe Kiela,et al.  Hyperbolic Graph Neural Networks , 2019, NeurIPS.

[4]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[5]  Mario Lezcano-Casado,et al.  Trivializations for Gradient-Based Optimization on Manifolds , 2019, NeurIPS.

[6]  Matthieu Cord,et al.  Riemannian batch normalization for SPD neural networks , 2019, NeurIPS.

[7]  Renjie Liao,et al.  Lorentzian Distance Learning for Hyperbolic Representations , 2019, ICML.

[8]  Frederic Sala,et al.  Learning Mixed-Curvature Representations in Product Spaces , 2018, ICLR.

[9]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[10]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[11]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[12]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[13]  Razvan Pascanu,et al.  Hyperbolic Attention Networks , 2018, ICLR.

[14]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[15]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[16]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[19]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[20]  Anoop Cherian,et al.  On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Blair D. Sullivan,et al.  Tree-Like Structure in Large Social and Information Networks , 2013, 2013 IEEE 13th International Conference on Data Mining.

[23]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[24]  Miroslav Bacák,et al.  Computing Medians and Means in Hadamard Spaces , 2012, SIAM J. Optim..

[25]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[26]  Abraham Albert Ungar,et al.  A Gyrovector Space Approach to Hyperbolic Geometry , 2009, A Gyrovector Space Approach to Hyperbolic Geometry.

[27]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[28]  Robert D. Kleinberg Geographic Routing Using Hyperbolic Space , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[29]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[30]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[31]  Franz-Erich Wolter Distance function and cut loci on a complete Riemannian manifold , 1979 .

[32]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[33]  Christopher De Sa,et al.  Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models , 2019, NeurIPS.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  B. Charlier Necessary and sufficient condition for the existence of a Fréchet mean on the circle , 2013 .

[36]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[37]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .