Towards Self-supervised Face Labeling via Cross-modality Association

Face recognition has become the de facto authentication solution in a broad spectrum of applications, from smart buildings, to industrial monitoring and security services. However, in many of those real-world scenarios, tracking or identifying people with facial recognition is extremely challenging due to the variations in the environment such as lighting conditions, camera viewing angles and subject motion. For most of the state-of-the-art face recognition systems, they need to be trained on a large dataset containing a good variety of labelled face images to work well. However, collecting and manually labelling such datasets is difficult and time consuming, probably more so than developing the algorithms. In this paper, we propose a novel framework to automatically label user identities with their face images in smart spaces, exploiting the fact that the users tend to carry their smart devices while seen by the surveillance cameras. We evaluate our method on 10 users in a smart building setting, and the experimental results show that our method can achieve > 0.9 f1 score on average.

[1]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sen Wang,et al.  SCAN: Learning Speaker Identity from Noisy Sensor Data , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).