Regularized Multi-Concept MIL for weakly-supervised facial behavior categorization

In this work, we address the problem of estimating high-level semantic labels for videos of recorded people by means of analysing their facial expressions. This problem, to which we refer as facial behavior categorization, is a weakly-supervised learning problem where we do not have access to frame-by-frame facial gesture annotations but only weak-labels at the video level are available. Therefore, the goal is to learn a set of discriminative expressions appearing during the training videos and how they determine these labels. Facial behavior categorization can be posed as a Multi-Instance-Learning (MIL) problem and we propose a novel MIL method called Regularized Multi-Concept MIL to solve it. In contrast to previous approaches applied in facial behavior analysis, RMC-MIL follows a Multi-Concept assumption which allows different facial expressions (concepts) to contribute differently to the video-label. Moreover, to handle with the high-dimensional nature of facial-descriptors, RMC-MIL uses a discriminative approach to model the concepts and structured sparsity regularization to discard non-informative features. RMC-MIL is posed as a convex-constrained optimization problem where all the parameters are jointly learned using the Projected-Quasi-Newton method. In our experiments, we use two public data-sets to show the advantages of the Regularized MultiConcept approach and its improvement compared to existing MIL methods. RMC-MIL outperforms state-of-the-art results in the UNBC data-set for pain detection.

[1]  Dan Zhang,et al.  A Discriminative Data-Dependent Mixture-Model Approach for Multiple Instance Learning in Image Classification , 2012, ECCV.

[2]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[3]  Daniel McDuff,et al.  Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected "In-the-Wild" , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Fernando De la Torre,et al.  Gaussian Processes Multiple Instance Learning , 2010, ICML.

[5]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[6]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[7]  Maja Pantic,et al.  Meta-Analysis of the First Facial Expression Recognition Challenge , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Daniel McDuff,et al.  Predicting online media effectiveness based on smile responses gathered over the Internet , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[12]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[13]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[14]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[15]  Cristian Sminchisescu,et al.  Convex Multiple-Instance Learning by Estimating Likelihood Ratio , 2010, NIPS.

[16]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[17]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[18]  Marian Stewart Bartlett,et al.  Weakly supervised pain localization using multiple instance learning , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[19]  Horst Bischof,et al.  MIForests: Multiple-Instance Learning with Randomized Trees , 2010, ECCV.

[20]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[22]  Maja Pantic,et al.  The Detection of Concept Frames Using Clustering Multi-instance Learning , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[24]  Jun Zhou,et al.  MILIS: Multiple Instance Learning with Instance Selection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[26]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[27]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[28]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[29]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[31]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.