Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Marios Savvides,et al.  Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[6]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Qingmin Liao,et al.  Margin Loss: Making Faces More Separable , 2018, IEEE Signal Processing Letters.

[9]  Yu Liu,et al.  Learning Deep Features via Congenerous Cosine Loss for Person Recognition , 2017, ArXiv.

[10]  Zsolt Kira,et al.  Learning to cluster in order to Transfer across domains and tasks , 2017, ICLR.

[11]  Daniel Cohen-Or,et al.  Clustering-Driven Deep Embedding With Pairwise Constraints , 2018, IEEE Computer Graphics and Applications.

[12]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13]  Zsolt Kira,et al.  Neural network-based clustering using pairwise constraints , 2015, ArXiv.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[16]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Jackie Chi Kit Cheung,et al.  Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization , 2018, *SEM@NAACL-HLT.

[20]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21]  Jiansheng Chen,et al.  Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[24]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[25]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matthew A. Brown,et al.  Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[30]  Manohar Paluri,et al.  Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[31]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[32]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .