Multi-Channel Pose-Aware Convolution Neural Networks for Multi-View Facial Expression Recognition

Although tremendous strides have been made in facial expression recognition(FER), recognizing facial expressions in non-frontal views remains an open challenge due to the limited access to large scale training data with various poses. To make full use of the limited data, we propose a novel multi-channel pose-aware convolution neural network (MPCNN) that consists of three parts: the multi-channel feature extraction, jointly multi-scale feature fusion, and the pose-aware recognition. The feature extraction part has 3 sub-CNNs and it learns convolutional features from different features. The joint fusion part fuses multi-scale features to enhance high-level feature representation in a hierarchical way. The fused features are fed to the pose-aware recognition part that includes pose-specific recognition branches and a pose estimation sub-network. According to the estimated pose, MPCNN finally classifies the facial expression through a conditional weighted combination of the pose-specific recognition branches. MPCNN is end-to-end trainable by minimizing the joint loss of pose and expression recognition. We evaluated the proposed method on two public multi-view FER datasets (BU-3DFE and KDEF) and a FER dataset in the wild (SFEW). The experimental results demonstrate that MPCNN outperforms the state-of-the-art FER methods with both within-dataset and cross-dataset settings.