Hierarchical committee of deep convolutional neural networks for robust facial expression recognition

This paper describes our approach towards robust facial expression recognition (FER) for the third Emotion Recognition in the Wild (EmotiW2015) challenge. We train multiple deep convolutional neural networks (deep CNNs) as committee members and combine their decisions. To improve this committee of deep CNNs, we present two strategies: (1) in order to obtain diverse decisions from deep CNNs, we vary network architecture, input normalization, and random weight initialization in training these deep models, and (2) in order to form a better committee in structural and decisional aspects, we construct a hierarchical architecture of the committee with exponentially-weighted decision fusion. In solving a seven-class problem of static FER in the wild for the EmotiW2015, we achieve a test accuracy of 61.6 %. Moreover, on other public FER databases, our hierarchical committee of deep CNNs yields superior performance, outperforming or competing with state-of-the-art results for these databases.

[1]  Anne-Laure Boulesteix,et al.  Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value , 2008, Bioinform..

[2]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[3]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[4]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[5]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[6]  Shiguang Shan,et al.  Enhancing Expression Recognition in the Wild with Unlabeled Reference Data , 2012, ACCV.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Jorma Laaksonen,et al.  Using diversity of errors for selecting members of a committee classifier , 2006, Pattern Recognit..

[9]  Ralph Gross,et al.  An Image Preprocessing Algorithm for Illumination Invariant Face Recognition , 2003, AVBPA.

[10]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[11]  Radu Tudor Ionescu,et al.  Objectness to improve the bag of visual words model , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Gonzalo Pajares,et al.  A Hopfield Neural Network for combining classifiers applied to textured images , 2010, Neural Networks.

[13]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[14]  Yurong Chen,et al.  Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild , 2015, ICMI.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  Aristidis Likas,et al.  Mixture of Experts Classification Using a Hierarchical Mixture Model , 2002, Neural Computation.

[19]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[20]  Ludmila I. Kuncheva,et al.  Relationships between combination methods and measures of diversity in combining classifiers , 2002, Inf. Fusion.

[21]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[22]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[23]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[24]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[25]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[26]  Carmen García-Mateo,et al.  On combining classifiers for speaker authentication , 2003, Pattern Recognit..

[27]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[28]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[29]  Yaxin Bi,et al.  On combining classifier mass functions for text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[31]  Caifeng Shan,et al.  Smile detection by boosting pixel differences , 2012, IEEE Transactions on Image Processing.

[32]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[33]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[34]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[35]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[36]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[37]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Ling Shao,et al.  Deep Dynamic Neural Networks for Gesture Segmentation and Recognition , 2014, ECCV Workshops.

[41]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[42]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[43]  Daoqiang Zhang,et al.  Hierarchical Ensemble of Multi-level Classifiers for Diagnosis of Alzheimer's Disease , 2012, MLMI.

[44]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[45]  Vitomir Struc,et al.  Photometric Normalization Techniques for Illumination Invariance , 2011 .

[46]  Gwen Littlewort,et al.  Toward Practical Smile Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Ching Y. Suen,et al.  The behavior-knowledge space method for combination of multiple classifiers , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Radu Tudor Ionescu,et al.  Local Learning to Improve Bag of Visual Words Model for Facial Expression Recognition , 2013 .

[50]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[53]  Christopher Joseph Pal,et al.  Facial Expression Analysis Based on High Dimensional Binary Features , 2014, ECCV Workshops.

[54]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[55]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[56]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[57]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[58]  Honglak Lee,et al.  Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising , 2013, NIPS.

[59]  Wen Gao,et al.  Hierarchical Ensemble of Global and Local Classifiers for Face Recognition , 2009, IEEE Trans. Image Process..

[60]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[61]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[63]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[64]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.