Value directed learning of gestures and facial displays

This paper presents a method for learning decision theoretic models of facial expressions and gestures from video data. We consider that the meaning of a facial display or gesture to an observer is contained in its relationship to context, actions and outcomes. An agent wishing to capitalize on these relationships must distinguish facial displays and gestures according to how they help the agent to maximize utility. This paper demonstrates how an agent can learn relationships between unlabeled observations of a person's face and gestures, the context, and its own actions and utility function. The agent needs no prior knowledge about the number or the structure of the gestures and facial displays that are valuable to distinguish. The agent discovers classes of human non-verbal behaviors, as well as which are important for choosing actions that optimize over the utility of possible outcomes. This value-directed model learning allows an agent to focus resources on recognizing only those behaviors which are useful to distinguish. We show results in a simple gestural robotic control problem and in a simple card game played by two human players.

[1]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[2]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[3]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[4]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[5]  Jesse Hoey,et al.  Decision Theoretic Modeling of Human Facial Displays , 2004, ECCV.

[6]  Roland T. Chin,et al.  On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[9]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[10]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[11]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[12]  J. Russell,et al.  The psychology of facial expression: Foreword , 1997 .

[13]  SingerYoram,et al.  The Hierarchical Hidden Markov Model , 1998 .

[14]  Alex Pentland,et al.  Active gesture recognition using partially observable Markov decision processes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[15]  A. J. Fridlund Human Facial Expression: An Evolutionary View , 1994 .

[16]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[18]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[19]  T. Jebara,et al.  Action-reaction learning : analysis and synthesis of human behaviour , 1998 .

[20]  Shaogang Gong,et al.  Data Driven Gesture Model Acquisition using Minimum Description Length , 2001 .