Unsupervised discovery of facial events

Automatic facial image analysis has been a long standing research problem in computer vision. A key component in facial image analysis, largely conditioning the success of subsequent algorithms (e.g. facial expression recognition), is to define a vocabulary of possible dynamic facial events. To date, that vocabulary has come from the anatomically-based Facial Action Coding System (FACS) or more subjective approaches (i.e. emotion-specified expressions). The aim of this paper is to discover facial events directly from video of naturally occurring facial behavior, without recourse to FACS or other labeling schemes. To discover facial events, we propose a temporal clustering algorithm, Aligned Cluster Analysis (ACA), and a multi-subject correspondence algorithm for matching expressions. We use a variety of video sources: posed facial behavior (Cohn-Kanade database), unscripted facial behavior (RU-FACS database) and some video in infants. Accuracy of (unsupervised) ACA approached that of a supervised version, achieved moderate intersystem agreement with FACS, and proved informative as a visualization/summarization tool.

[1]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Ankita Kumar,et al.  Support Kernel Machines for Object Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Kenneth C. Gilbert,et al.  MULTIDIMENSIONAL ASSIGNMENT PROBLEMS , 1988 .

[4]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[5]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[6]  Lihi Zelnik-Manor,et al.  Temporal Factorization vs. Spatial Factorization , 2004, ECCV.

[7]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  James M. Rehg,et al.  Learning the basic units in American Sign Language using discriminative segmental feature selection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[11]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[12]  Timothy F. Cootes,et al.  Modelling Facial Behaviours , 2002, BMVC.

[13]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[14]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[15]  J. Cohn,et al.  Automated Measurement of Facial Expression in Infant-Mother Interaction: A Pilot Study. , 2009, Infancy : the official journal of the International Society on Infant Studies.

[16]  Yiannis Aloimonos,et al.  A Language for Human Action , 2007, Computer.

[17]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Rama Chellappa,et al.  Unsupervised view and rate invariant clustering of video sequences q , 2009 .

[19]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[20]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Cor J. Veenman,et al.  Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  F. De la Torre,et al.  A UNIFICATION OF COMPONENT ANALYSIS METHODS , 2009 .

[23]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[24]  Tomoko Matsui,et al.  A Kernel for Time Series Based on Global Alignments , 2006, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[26]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[27]  Aubrey B. Poore,et al.  A Lagrangian Relaxation Algorithm for Multidimensional Assignment Problems Arising from Multitarget Tracking , 1993, SIAM J. Optim..

[28]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Fernando De la Torre,et al.  Temporal Segmentation of Facial Behavior , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[31]  William P. Pierskalla,et al.  Letter to the Editor - The Multidimensional Assignment Problem , 1968, Oper. Res..

[32]  Hassen T. Dorrah,et al.  The multidimensional assignment problem with application , 1990, Proceedings of the 33rd Midwest Symposium on Circuits and Systems.

[33]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[34]  Jeffrey F. Cohn,et al.  Observer-based measurement of facial expression with the Facial Action Coding System. , 2007 .

[35]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[36]  Jesse Hoey,et al.  Hierarchical unsupervised learning of facial expression categories , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[37]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[39]  Yang Wang,et al.  Unsupervised Discovery of Action Classes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Maja Pantic,et al.  Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Oriol Vinyals,et al.  Learning Kernel Expansions for Image Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[45]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[46]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[47]  Fernando De la Torre,et al.  Action unit detection with segment-based SVMs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.