Hyperspherical Prototype Networks

This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches.

[1]  Joachim Denzler,et al.  Deep Learning on Small Datasets without Pre-Training using Cosine Loss , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[5]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Chong-Wah Ngo,et al.  Transferrable Prototypical Networks for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[11]  Le Song,et al.  Learning towards Minimum Hyperspherical Energy , 2018, NeurIPS.

[12]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[13]  Zhiyuan Liu,et al.  Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification , 2019, AAAI.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Philip H. S. Torr,et al.  Prototypical Priors: From Improving Classification to Zero-Shot Learning , 2015, BMVC.

[16]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[17]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[18]  Liming Chen,et al.  von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification , 2017, ArXiv.

[19]  E. Saff,et al.  Distributing many points on a sphere , 1997 .

[20]  Alexander Ilin,et al.  Semi-Supervised Few-Shot Learning with Prototypical Networks , 2017, ArXiv.

[21]  Xiao Li,et al.  Prototype adjustment for zero shot classification , 2019, Signal Process. Image Commun..

[22]  Mervin E. Muller,et al.  A note on a method for generating points uniformly on n-dimensional spheres , 1959, CACM.

[23]  Marios Savvides,et al.  Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Le Song,et al.  Decoupled Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Joshua B. Tenenbaum,et al.  Infinite Mixture Prototypes for Few-Shot Learning , 2019, ICML.

[27]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[28]  Marcel Worring,et al.  OmniArt: Multi-task Deep Learning for Artistic Data Analysis , 2017, ArXiv.

[29]  Oleg R. Musin,et al.  The Tammes Problem for N = 14 , 2014, Exp. Math..

[30]  Amaury Habrard,et al.  Regressive Virtual Metric Learning , 2015, NIPS.

[31]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Muktabh Mayank Srivastava,et al.  Prototypical Metric Transfer Learning for Continuous Speech Keyword Spotting With Limited Training Data , 2019, SOCO.

[33]  Fei Yin,et al.  Robust Classification with Convolutional Prototype Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  J. S. Hicks,et al.  An efficient method for generating uniformly distributed points on the surface of an n-dimensional sphere , 1959, CACM.

[35]  Yuandong Tian,et al.  Scale-invariant learning and convolutional networks , 2015, ArXiv.

[36]  P. Tammes On the origin of number and arrangement of the places of exit on the surface of pollen-grains , 1930 .

[37]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[38]  B. Caputo,et al.  DEEP NEAREST CLASS MEAN CLASSIFIERS , 2018 .

[39]  Gabriela Csurka,et al.  Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[41]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[42]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[43]  Elad Hoffer,et al.  Fix your classifier: the marginal value of training the last weight layer , 2018, ICLR.

[44]  Le Song,et al.  Deep Hyperspherical Learning , 2017, NIPS.

[45]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[47]  Jianfeng Zhan,et al.  Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks , 2017, ICANN.