Emotional Expression Classification Using Time-Series Kernels

Estimation of facial expressions, as spatio-temporal processes, can take advantage of kernel methods if one considers facial landmark positions and their motion in 3D space. We applied support vector classification with kernels derived from dynamic time-warping similarity measures. We achieved over 99% accuracy - measured by area under ROC curve - using only the 'motion pattern' of the PCA compressed representation of the marker point vector, the so-called shape parameters. Beyond the classification of full motion patterns, several expressions were recognized with over 90% accuracy in as few as 5-6 frames from their onset, about 200 milliseconds.

[1]  András Lörincz,et al.  3D shape estimation in video sequences provides high precision evaluation of facial expressions , 2012, Image Vis. Comput..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Qingshan Liu,et al.  Boosting encoded dynamic features for facial expression recognition , 2009, Pattern Recognit. Lett..

[4]  Marian Stewart Bartlett,et al.  Facial expression recognition using Gabor motion energy filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[5]  MatthewsIain,et al.  Active Appearance Models Revisited , 2004 .

[6]  Maja Pantic,et al.  Combined Support Vector Machines and Hidden Markov Models for Modeling Facial Action Temporal Dynamics , 2007, ICCV-HCI.

[7]  Gwen Littlewort,et al.  Learning spatiotemporal features by using independent component analysis with application to facial expression recognition , 2012, Neurocomputing.

[8]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[9]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[10]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[11]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  Maja Pantic,et al.  Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition , 2011, Face and Gesture 2011.

[14]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[15]  Stefanos Zafeiriou,et al.  Infinite Hidden Conditional Random Fields for Human Behavior Analysis , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[17]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[18]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[19]  Tomoko Matsui,et al.  A Kernel for Time Series Based on Global Alignments , 2006, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[21]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[22]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[23]  H. Stanley,et al.  Switching processes in financial markets , 2011, Proceedings of the National Academy of Sciences.