Facial communicative signal interpretation in human-robot interaction by discriminative video subsequence selection

Facial communicative signals (FCSs) such as head gestures, eye gaze, and facial expressions can provide useful feedback in conversations between people and also in human-robot interaction. This paper presents a pattern recognition approach for the interpretation of FCSs in terms of valence, based on the selection of discriminative subsequences in video data. These subsequences capture important temporal dynamics and are used as prototypical reference subsequences in a classification procedure based on dynamic time warping and feature extraction with active appearance models. Using this valence classification, the robot can discriminate positive from negative interaction situations and react accordingly. The approach is evaluated on a database containing videos of people interacting with a robot by teaching the names of several objects to it. The verbal answer of the robot is expected to elicit the display of spontaneous FCSs by the human tutor, which were classified in this work. The achieved classification accuracies are comparable to the average human recognition performance and outperformed our previous results on this task.

[1]  Jian-Gang Wang,et al.  EM enhancement of 3D head pose estimated by point at infinity , 2007, Image Vis. Comput..

[2]  Gwen Littlewort,et al.  Fully Automatic Facial Action Recognition in Spontaneous Behavior , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[3]  Ling Chen,et al.  Large head movement tracking using sift-based registration , 2007, ACM Multimedia.

[4]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shaogang Gong,et al.  Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model , 2006, BMVC.

[6]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[7]  Hatice Gunes,et al.  Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners , 2010, IVA.

[8]  Simon Baker Real-time non-rigid driver head tracking for driver mental state estimation , 2004 .

[9]  A. J. Fridlund IS THERE UNIVERSAL RECOGNITION OF EMOTION FROM FACIAL EXPRESSION? A REVIEW OF THE CROSS-CULTURAL STUDIES , 1994 .

[10]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[11]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[12]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[13]  Helge J. Ritter,et al.  Image Based Recognition of Graze Direction Using Adaptive Methods , 1997, Gesture Workshop.

[14]  Alexander Zelinsky,et al.  Real-time stereo tracking for head pose and gaze estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[15]  Nicu Sebe,et al.  Authentic Facial Expression Analysis , 2004, FGR.

[16]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Murphy-ChutorianErik,et al.  Head Pose Estimation in Computer Vision , 2009 .

[18]  Carlos Hitoshi Morimoto,et al.  Eye gaze tracking techniques for interactive applications , 2005, Comput. Vis. Image Underst..

[19]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Akhilesh Tiwari,et al.  A Survey on Frequent Pattern Mining: Current Status and Challenging Issues , 2010 .

[21]  Heiko Wersing,et al.  Feedback interpretation based on facial expressions in human-robot interaction , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[22]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Modesto Castrillón,et al.  The ENCARA system for face detection and normalization , 2003 .

[24]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[25]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[26]  Enrique Muñoz,et al.  Recognising facial expressions in video sequences , 2007, Pattern Analysis and Applications.

[27]  Takahiro Ishikawa,et al.  Passive driver gaze tracking with active appearance models , 2004 .

[28]  Zhengyou Zhang,et al.  A Survey of Recent Advances in Face Detection , 2010 .

[29]  Emiel Krahmer,et al.  Problem detection in human-machine interactions based on facial expressions of users , 2005, Speech Commun..

[30]  Daniel McDuff,et al.  Affect valence inference from facial action unit spectrograms , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[31]  M. T. Motley,et al.  Facial expression of emotion: A comparison of posed expressions versus spontaneous expressions in an interpersonal communication setting , 1988 .

[32]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[33]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[34]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[35]  Heiko Wersing,et al.  Facial expressions as feedback cue in human-robot interaction—a comparison between human and automatic recognition performances , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[36]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Timothy F. Cootes,et al.  Face Recognition Using Active Appearance Models , 1998, ECCV.

[38]  Jian-Gang Wang,et al.  Study on eye gaze estimation , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[39]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[40]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[41]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[42]  Oscar Déniz-Suárez,et al.  ENCARA2: Real-time detection of multiple faces at different resolutions in video streams , 2007, J. Vis. Commun. Image Represent..

[43]  A. J. Fridlund Human Facial Expression: An Evolutionary View , 1994 .

[44]  Jignesh M. Patel,et al.  Efficient and Accurate Discovery of Patterns in Sequence Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[45]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[46]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[47]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[48]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[49]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[50]  Ralph Gross,et al.  Generic vs. person specific active appearance models , 2005, Image Vis. Comput..

[51]  P. Ivan,et al.  Active Appearance Models for Gaze Estimation Masters Thesis , 2007 .

[52]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[54]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[55]  Marc Hanheide,et al.  Automatic Initialization for Facial Analysis in Interactive Robotics , 2008, ICVS.

[56]  Christian Lang,et al.  Facial Communicative Signals: valence recognition in task-oriented human-robot Interaction , 2012 .

[57]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[59]  Timothy F. Cootes,et al.  Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Nico H. Frijda,et al.  The understanding of facial expression of emotion. , 1953 .

[61]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[62]  Maja Pantic,et al.  Generic Active Appearance Models Revisited , 2012, ACCV.

[63]  Qingshan Liu,et al.  Facial expression recognition using encoded dynamic features , 2007, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Marc Hanheide,et al.  Evaluation and Discussion of Multi-modal Emotion Recognition , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[65]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.