Weakly supervised pain localization using multiple instance learning

Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through `concept frames' to `concept segments' and argues through extensive experiments that algorithms like MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of our approach by achieving promising results on the problem of pain detection in videos.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Gang Wang,et al.  Using Dependent Regions for Object Categorization in a Generative Framework , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  K. Prkachin,et al.  The structure, reliability and validity of pain expression: Evidence from patients with shoulder pain , 2008, PAIN.

[6]  Marian Stewart Bartlett,et al.  Exploring Bag of Words Architectures in the Facial Expression Domain , 2012, ECCV Workshops.

[7]  Boris Babenko,et al.  Weakly Supervised Object Localization with Stable Segmentations , 2008, ECCV.

[8]  W. David Leak,et al.  Handbook of Pain Assessment , 1994 .

[9]  K. Craig,et al.  Genuine, suppressed and faked facial behavior during exacerbation of chronic low back pain , 1991, Pain.

[10]  Horst Bischof,et al.  Multiple Instance Boosting for Face Recognition in Videos , 2011, DAGM-Symposium.

[11]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[13]  Sridha Sridharan,et al.  Improving pain recognition through better utilisation of temporal information , 2008, AVSP.

[14]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Fernando De la Torre,et al.  Action unit detection with segment-based SVMs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[19]  Randolph R. Cornelius,et al.  The science of emotion: Research and tradition in the psychology of emotion. , 1997 .

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Yang Song,et al.  Handling label noise in video classification via multiple instance learning , 2011, 2011 International Conference on Computer Vision.

[22]  Maja Pantic,et al.  The Detection of Concept Frames Using Clustering Multi-instance Learning , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[24]  Giridharan Iyengar,et al.  A Cascade Visual Front End for Speaker Independent Automatic Speechreading , 2001, Int. J. Speech Technol..

[25]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[26]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[27]  Serge J. Belongie,et al.  Simultaneous Learning and Alignment: Multi-Instance and Multi-Pose Learning ? , 2008 .

[28]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.