AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

The cosine-based softmax losses and their variants achieve great success in deep learning based face recognition. However, hyperparameter settings in these losses have significant influences on the optimization path as well as the final recognition performance. Manually tuning those hyperparameters heavily relies on user experience and requires many training tricks. In this paper, we investigate in depth the effects of two important hyperparameters of cosine-based softmax losses, the scale parameter and angular margin parameter, by analyzing how they modulate the predicted classification probability. Based on these analysis, we propose a novel cosine-based softmax loss, AdaCos, which is hyperparameter-free and leverages an adaptive scale parameter to automatically strengthen the training supervisions during the training process. We apply the proposed AdaCos loss to large-scale face verification and identification datasets, including LFW, MegaFace, and IJB-C 1:1 Verification. Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy. Our method outperforms state-of-the-art softmax losses on all the three datasets.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[3]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[4]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[5]  Anil K. Jain,et al.  IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[6]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[10]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[12]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[13]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[16]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[17]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[18]  Ira Kemelmacher-Shlizerman,et al.  Level Playing Field for Million Scale Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yu Liu,et al.  Learning Deep Features via Congenerous Cosine Loss for Person Recognition , 2017, ArXiv.

[21]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[22]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[23]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[24]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[25]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[27]  Xiao Zhang,et al.  Range Loss for Deep Face Recognition with Long-Tailed Training Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[32]  Yu Liu,et al.  Recurrent Scale Approximation for Object Detection in CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[34]  Yu Liu,et al.  Rethinking Feature Discrimination and Polymerization for Large-scale Recognition , 2017, ArXiv.

[35]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Marios Savvides,et al.  Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[40]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[43]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[45]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[46]  Yiming Yang,et al.  Von Mises-Fisher Clustering Models , 2014, ICML.