Social Relationship Recognition Based on A Hybrid Deep Neural Network

Social relations reveal the interpersonal association of human beings. Developing techniques to automatically recognize social relations from visual data has great potential for improving human-computer interaction. In this paper, a hybrid deep network is proposed to predict the social relations between two human beings in an image. Unlike existing methods that typically learn deep learning models from scratch, a VGG-FACE model previously trained for face recognition is fine-tuned on a social relation database and used as branches of a siamese-like network. Moreover, a deep network is proposed to extract scene features that contain high-level information related to social relations from whole images and its predictions are fused with the predictions of the siamese network to generate the final result. Experiments show that the proposed approach saves the effort of pre-training and preparing auxiliary datasets, i.e. facial attribute datasets, and outperforms state-of-the-art methods.

[1]  David A. Forsyth,et al.  Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.

[2]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Ioannis A. Kakadiaris,et al.  Modeling local behavior for predicting social interactions towards human tracking , 2014, Pattern Recognit..

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Eliot R. Smith,et al.  Exemplar and Prototype Use in Social Categorization , 1990 .

[6]  Xin Guo,et al.  Group-level emotion recognition using deep models on image scene, faces, and skeletons , 2017, ICMI.

[7]  Fei-Fei Li,et al.  Social Role Discovery in Human Events , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bin Zhu,et al.  Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions , 2018, ICMI.

[9]  D. Kiesler The 1982 Interpersonal Circle: A taxonomy for complementarity in human transactions. , 1983 .

[10]  Tamás D. Gedeon,et al.  EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction , 2018, ICMI.

[11]  Nicoletta Noceti,et al.  Humans in groups: The importance of contextual information for understanding collective activities , 2014, Pattern Recognit..

[12]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[14]  Xin Guo,et al.  Smile Detection in the Wild Based on Transfer Learning , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[15]  Bernt Schiele,et al.  A Domain Based Approach to Social Relation Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhen Qin,et al.  Improving multi-target tracking via social grouping , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Yang Wang,et al.  Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Mohan S. Kankanhalli,et al.  Dual-Glance Model for Deciphering Social Relationships , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Alper Yilmaz,et al.  Learning Relations among Movie Characters: A Social Network Perspective , 2010, ECCV.

[21]  Xiaoou Tang,et al.  From Facial Expression Recognition to Interpersonal Relation Prediction , 2016, International Journal of Computer Vision.

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Glenn Fung,et al.  Ordinal Regression Using Noisy Pairwise Comparisons for Body Mass Index Range Estimation , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Xiaoou Tang,et al.  Learning Social Relation Traits from Face Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Derek Hoiem,et al.  Family Member Identification from Photo Collections , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[26]  Noel E. O'Connor,et al.  Team Activity Recognition in Sports , 2012, ECCV.

[27]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[28]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[29]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Silvio Savarese,et al.  A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[31]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[32]  Xiaogang Wang,et al.  Hybrid Deep Learning for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.