Online nod detection in human-robot interaction

Nodding is an important factor in human communication, providing a physical cue for socially communicative acts such as turn taking, backchanneling, and confirmation. In this article, we describe a vision-based online head nodding detector that works with monocular camera images. Using SVM regression, our system estimates the head pose based on facial landmarks. Subsequence dynamic time-warping is then used to compare head pose features against nod templates. In contrast to many other previous implementations, our system was evaluated with study participants who were not instructed to reply by nodding, and shows good results while maintaining a low false positive rate.

[1]  Stacy Marsella,et al.  Predicting Speaker Head Nods and the Effects of Affective Information , 2010, IEEE Transactions on Multimedia.

[2]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[3]  Evelyn Z. McClave Linguistic functions of head movements in the context of speech , 2000 .

[4]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[5]  Hatice Gunes,et al.  Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners , 2010, IVA.

[6]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[7]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[8]  Jean-Marc Odobez,et al.  Head Nod Detection from a Full 3D Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  David S. Monaghan,et al.  Real-time head nod and shake detection for continuous human affect recognition , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[10]  Joan Climent,et al.  A Performance Evaluation of HMM and DTW for Gesture Recognition , 2012, CIARP.

[11]  Gang Rong,et al.  A real-time head nod and shake detector using HMMs , 2003, Expert Syst. Appl..

[12]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[13]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..