An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Distances are pervasive in machine learning. They serve as similarity measures, loss functions, and learning targets; it is said that a good distance measure solves a task. When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias. Deep metric learning architectures that respect the triangle inequality rely, almost exclusively, on Euclidean distance in the latent space. Though effective, this fails to model two broad classes of subadditive distances, common in graphs and reinforcement learning: asymmetric metrics, and metrics that cannot be embedded into Euclidean space. To address these problems, we introduce novel architectures that are guaranteed to satisfy the triangle inequality. We prove our architectures universally approximate norm-induced metrics on $\mathbb{R}^n$, and present a similar result for modified Input Convex Neural Networks. We show that our architectures outperform existing metric approaches when modeling graph distances and have a better inductive bias than non-metric approaches when training data is limited in the multi-goal reinforcement learning setting.

[1]  F. L. Bauer,et al.  Absolute and monotonic norms , 1961 .

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[4]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[7]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[8]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[9]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[10]  Inderjit S. Dhillon,et al.  Triangle Fixing Algorithms for the Metric Nearness Problem , 2004, NIPS.

[11]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[12]  S. S. Ravi,et al.  Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results , 2009, Data Mining and Knowledge Discovery.

[13]  Inderjit S. Dhillon,et al.  The Metric Nearness Problem , 2008, SIAM J. Matrix Anal. Appl..

[14]  Klotilda Lazaj Metric Preserving Functions , 2009 .

[15]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[16]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[17]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[18]  Terrance E. Boult,et al.  Good recognition is non-metric , 2013, Pattern Recognit..

[19]  David W. Jacobs,et al.  An Efficient Algorithm for Learning Distances that Obey the Triangle Inequality , 2015, BMVC.

[20]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[21]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[22]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Bin Fan,et al.  Beyond Mahalanobis metric: Cayley-Klein metric learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[27]  Frank Nielsen,et al.  Classification with mixtures of curved mahalanobis metrics , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[28]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[29]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[30]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lalit Jain,et al.  If it ain't broke, don't fix it: Sparse metric repair , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Deborah Estrin,et al.  Collaborative Metric Learning , 2017, WWW.

[33]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[34]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[35]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[36]  Jian-Huang Lai,et al.  An Asymmetric Distance Model for Cross-View Feature Mapping in Person Reidentification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[38]  David F. Gleich,et al.  A Projection Method for Metric-Constrained Optimization , 2018, ArXiv.

[39]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[40]  Qi Wang,et al.  Deep Metric Learning for Crowdedness Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[42]  Yuanyuan Shi,et al.  Optimal Control Via Neural Networks: A Convex Approach , 2018, ICLR.