论文信息 - Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

[1] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Marios Savvides,et al. Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5] David Vandyke,et al. Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[6] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Song Bai,et al. Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Qingmin Liao,et al. Margin Loss: Making Faces More Separable , 2018, IEEE Signal Processing Letters.

[9] Yu Liu,et al. Learning Deep Features via Congenerous Cosine Loss for Person Recognition , 2017, ArXiv.

[10] Zsolt Kira,et al. Learning to cluster in order to Transfer across domains and tasks , 2017, ICLR.

[11] Daniel Cohen-Or,et al. Clustering-Driven Deep Embedding With Pairwise Constraints , 2018, IEEE Computer Graphics and Applications.

[12] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13] Zsolt Kira,et al. Neural network-based clustering using pairwise constraints , 2015, ArXiv.

[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15] James Bailey,et al. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[16] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19] Jackie Chi Kit Cheung,et al. Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization , 2018, *SEM@NAACL-HLT.

[20] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21] Jiansheng Chen,et al. Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[23] Carlos D. Castillo,et al. L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[24] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[25] Jian Cheng,et al. NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[26] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Matthew A. Brown,et al. Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] David J. Fleet,et al. Hamming Distance Metric Learning , 2012, NIPS.

[30] Manohar Paluri,et al. Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[31] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[32] J. Munkres. ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .