论文信息 - A Six-Unit Network is All You Need to Discover Happiness

A Six-Unit Network is All You Need to Discover Happiness

A Six-Unit Network is All You Need to Discover Happiness Matthew N. Dailey Garrison W. Cottrell f mdailey,gary g @cs.ucsd.edu UCSD Computer Science and Engineering 9500 Gilman Dr., La Jolla, CA 92093-0114 USA Abstract In this paper, we build upon previous results to show that our facial expression recognition system, an ex- tremely simple neural network containing six units, trained by backpropagation, is a surprisingly good com- putational model that obtains a natural t to human data from experiments that utilize a forced-choice clas- sication paradigm. The model begins by computing a biologically plausible representation of its input, which is a static image of an actor portraying a prototypical expression of either Happiness, Sadness, Fear, Anger, Surprise, Disgust, or Neutrality. This representation of the input is fed to a single-layer neural network contain- ing six units, one for each non-neutral facial expression. Once trained, the network's response to face stimuli can be subjected to a variety of \cognitive measures and compared to human performance in analogous tasks. In some cases, the t is even better than one might expect from an impoverished network that has no knowledge of culture or social interaction. The results provide in- sights into some of the perceptual mechanisms that may underlie human social behavior, and we suggest that the system is a good model for one of the ways in which the brain utilizes information in the early visual system to help guide high-level decisions. Introduction In this paper, we report on recent progress in under- standing human facial expression perception via compu- tational modeling. Our research has resulted in a facial expression recognition system that is capable of discrimi- nating prototypical displays of Happiness, Sadness, Fear, Anger, Surprise, and Disgust at roughly the level of an untrained human. We propose that the system provides a good model of the perceptual mechanisms and deci- sion making processes involved in a human's ability to perform forced-choice identication of the same facial expressions. The present series of experiments provides signicant evidence for this claim. One of the ongoing debates in the psychological lit- erature on emotion centers on the structure of emotion space. On one view, there is a set of discrete basic emo- tions that are fundamentally dierent in terms of phys- iology, means of appraisal, typical behavioral response, etc. (Ekman, 1999). Facial expressions, according to this categorical view, are universal signals of these basic emo- tions. Another prominent view is that emotion concepts are best thought of as prototypes in a continuous, low- dimensional space of possible emotional states, and that facial expressions are mere clues that allow an observer to locate an approximate region in this space (e.g. Rus- sell, 1980; Carroll and Russell, 1996). One type of evidence sometimes taken as support for categorical theories of emotion involves experiments that Ralph Adolphs ralph-adolphs@uiowa.edu University of Iowa Department of Neurology 220 Hawkins Dr., Iowa City, IA 52242 USA show \categorical perception of facial expressions (Et- co and Magee, 1992; Young et al., 1997). Categorical perception is a discontinuity characterized by sharp per- ceptual category boundaries and better discrimination near those boundaries, as in the bands of color in a rain- bow. But as research in the classication literature has shown (e.g. Ellison and Massaro, 1997), seemingly cate- gorical eects naturally arise when an observer is asked to employ a decision criterion based on continuous infor- mation. Neural networks also possess this dual nature; many networks trained at classication tasks map con- tinuous input features into a continuous output space, but when we apply a decision criterion (such as \choose the biggest output ) we may obtain the appearance of sharp category boundaries and high discrimination near those boundaries, as in categorical perception. Our model, which combines a biologically plausible input representation with a simple form of categoriza- tion (a six-unit softmax neural network), is able to ac- count for several types of data from human forced-choice expression recognition experiments. Though we would not actually propose a localist representation of the fa- cial expression category decision (we of course imagine a more distributed representation), the evidence leads us to propose 1) that the model's input representation bears a close relationship to the representation employed by the human visual system for the expression recognition task, and 2) that a dual continuous/categorical model, in which a continuous representation of facial expres- sions coexists with a discrete decision process (either of which could be tapped by appropriate tasks), may be a more appropriate way to frame human facial expression recognition than either a strictly categorical or strictly continuous model. The Expression Classication Model For an overview of our computational model, refer to Figure 1. The system takes a grayscale image as input, computes responses to a lattice of localized, oriented spatial lters (Gabor lters) and reduces the resulting high dimensional input by unsupervised dimensionality reduction (Principal Components Analysis). The result- ing low-dimensional representation is then fed to a single- layer neural network with six softmax units (whose sum is constrained to be 1.0), each corresponding to one ex- pression category. We now describe each of the compo- nents of the model in more detail. The Training Set: Pictures of Facial Aect The model's training set is Ekman and Friesen's Pictures of Facial Aect (POFA, 1976). This database is a good

Garrison W. Cottrell | Matthew N. Dailey | Ralph Adolphs

[1] M. Katsikitis,et al. The Classification of Facial Expressions of Emotion: A Multidimensional-Scaling Approach , 1997, Perception.

[2] J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[3] D. Massaro,et al. Featural evaluation, integration, and judgment of facial affect. , 1997, Journal of experimental psychology. Human perception and performance.

[4] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5] Marian Stewart Bartlett,et al. Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Norbert Krüger,et al. Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[7] Garrison W. Cottrell,et al. PCA = Gabor for Expression Recognition , 1999 .

[8] Michael J. Lyons,et al. Automatic Classification of Single Facial Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9] G. Cottrell,et al. A Simple Neural Network Models Categorical Perception of Facial Expressions , 1998 .

[10] J. M. Carroll,et al. Do facial expressions signal specific emotions? Judging emotion from the face in context. , 1996, Journal of personality and social psychology.

[11] John J. Magee,et al. Categorical perception of facial expressions , 1992, Cognition.