GANmut: Learning Interpretable Conditional Space for Gamut of Emotions

Humans can communicate emotions through a plethora of facial expressions, each with its own intensity, nuances and ambiguities. The generation of such variety by means of conditional GANs is limited to the expressions encoded in the used label system. These limitations are caused either due to burdensome labelling demand or the confounded label space. On the other hand, learning from inexpensive and intuitive basic categorical emotion labels leads to limited emotion variability. In this paper, we propose a novel GAN-based framework that learns an expressive and interpretable conditional space (usable as a label space) of emotions, instead of conditioning on handcrafted labels. Our framework only uses the categorical labels of basic emotions to learn jointly the conditional space as well as emotion manipulation. Such learning can benefit from the image variability within discrete labels, especially when the intrinsic labels reside beyond the discrete space of the defined. Our experiments demonstrate the effectiveness of the proposed framework, by allowing us to control and generate a gamut of complex and compound emotions while using only the basic categorical emotion labels during training. Our source code is available at https://github.com/stefanodapolito/GANmut.

[1]  Luc Van Gool,et al.  SMILE: Semantically-guided Multi-attribute Image and Layout Editing , 2020, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[2]  Aaron Hertzmann,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[3]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[4]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  SMIT: Stochastic Multi-Label Image-to-Image Translation , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[7]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[8]  Samuli Laine,et al.  Feature-Based Metrics for Exploring the Latent Space of Generative Models , 2018, ICLR.

[9]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[11]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[12]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[13]  J. Schulman,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[16]  Yong Tao,et al.  Compound facial expressions of emotion , 2014, Proceedings of the National Academy of Sciences.

[17]  Sunghwan Mac Kim,et al.  EMOTIONS IN TEXT: DIMENSIONAL AND CATEGORICAL MODELS , 2013, Comput. Intell..

[18]  Michael Potegal,et al.  Screaming, yelling, whining, and crying: categorical and intensity differences in vocal expressions of anger and sadness in children's tantrums. , 2011, Emotion.

[19]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[21]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[22]  I. E. Josephs,et al.  The expressive and communicative functions of preschool children's smiles in an achievement-situation , 1991 .

[23]  J. Russell,et al.  A cross-cultural study of a circumplex model of affect. , 1989 .

[24]  J. Russell A circumplex model of affect. , 1980 .

[25]  R. Kraut,et al.  Social and emotional messages of smiling: An ethological approach. , 1979 .

[26]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[27]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  K. Scherer Psychological models of emotion. , 2000 .

[30]  R. Larsen,et al.  Promises and problems with the circumplex model of emotion. , 1992 .