Happiness level prediction with sequential inputs via multiple regressions

This paper presents our solution submitted to the Emotion Recognition in the Wild (EmotiW 2016) group-level happiness intensity prediction sub-challenge. The objective of this sub-challenge is to predict the overall happiness level given an image of a group of people in a natural setting. We note that both the global setting and the faces of the individuals in the image influence the group-level happiness intensity of the image. Hence the challenge lies in building a solution that incorporates both these factors and also considers their right combination. Our proposed solution incorporates both these factors as a combination of global and local information. We use a convolutional neural network to extract discriminative face features, and a recurrent neural network to selectively memorize the important features to perform the group-level happiness prediction task. Experimental evaluations show promising performance improvements, resulting in Root Mean Square Error (RMSE) reduction of about 0.5 units on the test set compared to the baseline algorithm that uses only global information.

[1]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wenxuan Mou,et al.  Group-level arousal and valence recognition in static images: Face, body and context , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[3]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[6]  Ashraf A. Kassim,et al.  Facial Landmark Detection via Progressive Initialization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[7]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[8]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Matti Pietikäinen,et al.  Riesz-based Volume Local Binary Pattern and A Novel Group Expression Model for Group Happiness Intensity Analysis , 2015, BMVC.

[11]  Tamás D. Gedeon,et al.  Automatic Group Happiness Intensity Analysis , 2015, IEEE Transactions on Affective Computing.