Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection

Important people detection is to automatically detect the individuals who play the most important roles in a social event image, which requires the designed model to understand a high-level pattern. However, existing methods rely heavily on supervised learning using large quantities of annotated image samples, which are more costly to collect for important people detection than for individual entity recognition (i.e., object recognition). To overcome this problem, we propose learning important people detection on partially annotated images. Our approach iteratively learns to assign pseudo-labels to individuals in un-annotated images and learns to update the important people detection model based on data with both labels and pseudo-labels. To alleviate the pseudo-labelling imbalance problem, we introduce a ranking strategy for pseudo-label estimation, and also introduce two weighting strategies: one for weighting the confidence that individuals are important people to strengthen the learning on important people and the other for neglecting noisy unlabelled images (i.e., images without any important people). We have collected two large-scale datasets for evaluation. The extensive experimental results clearly confirm the efficacy of our method attained by leveraging unlabelled images for improving the performance of important people detection.

[1]  Karl Stratos,et al.  Understanding and predicting importance in images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[3]  Abhinav Dhall,et al.  Role of Group Level Affect to Find the Most Influential Person in Images , 2018, ECCV Workshops.

[4]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[5]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shaogang Gong,et al.  Semi-supervised Deep Learning with Memory , 2018, ECCV.

[7]  Wei-Shi Zheng,et al.  PersonRank: Detecting Important People in Images , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[8]  Wei-Shi Zheng,et al.  Learning to Learn Relation for Important People Detection in Still Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Quoc V. Le,et al.  Unsupervised Data Augmentation , 2019, ArXiv.

[10]  Jian Yang,et al.  DSFD: Dual Shot Face Detector , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[12]  Duy-Dinh Le,et al.  Finding Important People in Large News Video Databases Using Multimodal and Clustering Analysis , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[13]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[14]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[15]  Yannis Avrithis,et al.  Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[17]  Wei-Shi Zheng,et al.  Latent embeddings for collective activity recognition , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[18]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Chuan-Sheng Foo,et al.  Learning to Impute: A General Framework for Semi-supervised Learning , 2019, ArXiv.

[22]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[23]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[24]  Zhi-Hua Zhou,et al.  Tri-net for Semi-Supervised Deep Learning , 2018, IJCAI.

[25]  Li Fei-Fei,et al.  Detecting Events and Key Actors in Multi-person Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[27]  Andrew C. Gallagher,et al.  VIP: Finding important people in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.