论文信息 - The Meme Quiz: A Facial Expression Game Combining Human Agency and Machine Involvement

The Meme Quiz: A Facial Expression Game Combining Human Agency and Machine Involvement

We describe a game with a purpose called The Meme Quiz in which a human player mimics popular Internet memes, and the system guesses which expression the player imitated. The purpose of the game is to collect a useful dataset of in-the-wild facial expressions. The game was deployed with 198 players contributing 2,860 labeled images. In contrast to many data-gathering games that use interaction between humans to define the mechanics and verify the data, our game has an online machine learning system at its core. As more people play and make faces, The Meme Quiz gathers more data and makes better guesses over time. One main advantage of this setup is the ability to monitor the usefulness of the data as it is collected and to watch for improvement, instead of waiting until the end of the game to process the data. Our contributions are 1) the design and deployment of a game for collecting diverse, real-world facial expression data and 2) an exploration of the design space of datagathering games along two axes: human agency and machine involvement, including advantages of building a game around an interactive domain-specific technical system. Keywords games with a purpose, facial expression recognition, crowdsourcing, computer vision, machine learning 1. INTRODUCTION Facial expression recognition is an important part of affective computing. Typically, only the six basic expressions of joy, sadness, fear, anger, surprise, and disgust are used in affective computing – a small set of facial expressions by all accounts. We are interested in extending the capabilities of automated expression recognition and in collecting a new dataset of facial expressions that includes many new expressions. To do so, we take advantage of the broad set of facial expressions that appear in Internet memes. (a) Not Bad Obama (b) Not Impressed McKayla Figure 1: Example Internet Memes portraying several different emotions. Do these emotions have obvious names, or is the picture itself a more concise way of conveying the emotion? Reaction images [1], known by the shorthand MRW (“My Reaction When”), are a type of meme that portray an emotion in response to something said or experienced. “Not Bad Obama” and “Not Impressed McKayla”, shown in Figure 1, are two recognizable media images that have been elevated to meme status. MRW memes can also include non-human faces, such as “Grumpy Cat” in Figure 6(c). These reaction images may have value beyond entertainment; Figure 2 shows a MRW meme known as“Success Kid”annotated with a story about a user’s breakthrough using reaction memes to convey emotional state to a therapist. Communicative expression-related memes would be useful to affective computing, where a primary goal is to assist people who struggle with reading or communicating emotions in their everyday lives. Although reaction memes themselves are popular on the Internet, there is no data source of everyday people portraying these same facial expressions. Anticipating that imitating memes would not only generate useful data but also be amusing and compelling, we set out to build a game to crowdsource photos of people imitating meme facial expressions. Since our end goal is to use the collected dataset to train an automated system to recognize these new expressions, we decided to build the training and teaching of this system into the game. In this paper we present The Meme Quiz, a game we developed where the player is asked to act out a meme expression and the system guesses which meme the player is imitating. Over time as the game collects more data, the system improves and is able to guess expressions correctly. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Proceedings of the 10th International Conference on the Foundations of Digital Games (FDG 2015), June 22-25, 2015, Pacific Grove, CA, USA. ISBN 978-0-9913982-4-9. Copyright held by author(s). Figure 2: A “My reaction when” meme depicting a story about using MRW memes to convey emotional state to a therapist. We designed our game such that it does not require the expression recognition technology to work perfectly; the fun of the game comes from the fact that the system occasionally makes mistakes. In fact, our system can start learning immediately and does not need to be bootstrapped with initial data. In the middle of deployment, we were able to adapt the game by adding new memes to impersonate, and we were able to monitor the health of the data over time to make sure the system was in fact learning these novel facial expressions. Because our game uses online machine learning as its core mechanic, it is different from other crowdsourced data generation games. In the rest of this paper, we explore the space of crowdsourced data-generation games along two dimensions of human agency and machine involvement, and describe how The Meme Quiz fits as a game with high agency for both the human and the computer. Our contributions are 1) the design and deployment of a game for collecting diverse facial expression data and 2) an exploration of the design space of data-gathering games along two axes: human agency and machine involvement, including advantages of building a game around an interactive domain-specific technical system. 2. RELATED WORK This section focuses on background work related to facial expressions and crowdsourcing of these expressions. Games with a purpose are highly relevant, and we discuss many games in Section 3 on our proposed design space for datagathering games. Name Subjects Photos per Subject Expressions CK+ 127 4 videos 6 Multi PIE 337 2,225 photos 6 MMI 90 20 videos+photos 6+AUs AM-FED 242 1 videos 2 Table 1: Comparison of facial expression datasets There are a number of existing facial expression datasets, such as CK+ [15], CMU Multi-PIE [7], and MMI [18], which have been laboriously captured and annotated with emotion and Action Unit (AU) labels from the Facial Action Coding System (FACS). These datasets have fueled facial expression recognition research for over a decade and Table 1 shows a comparison of these datasets. These standard datasets are often collected in controlled lab environments with, at most, a few hundred subjects. In practical applications, face trackers must work on a wide variety of facial appearances and in many different lighting conditions, and on more than six expressions. Bigger datasets are necessary, as well as datasets captured in the real world in realistic situations, such using a webcam or a front-facing camera on a mobile phone. Our game captures faces in realistic capture conditions, and includes many more expressions. The AM-FED [16] dataset was also captured “in the wild” by recording subjects’ faces as they watched Super Bowl advertisements on their own computers. As an incentive for allowing their reactions be recorded, subjects were shown a chart of their smile activity compared to others, which is an interesting integration of computer vision back into the user experience. The expressions captured in AM-FED are spontaneous (or as spontaneous as they can be when the subjects are aware they are being recorded), but the videos were chosen to elicit joy only, so the dataset does not span a wide range of emotions, or even very extreme emotions. While our own dataset is posed, it includes the basic expressions as well as many more, captured in real world environments. Capturing spontaneous expressions is difficult, as it requires subjecting users to unpleasant stimuli designed to elicit emotions such as disgust, fear, or pain, and different people might not be sensitive to the same stimuli. Recently, Li et. al. [14] and Yan et. al. [26] have compiled datasets of spontaneous micro-expressions using videos chosen to elicit emotions including joy, surprise, and disgust and encouraging subjects to try to hide their emotions. Zhang et. al. [27] have also captured a 3D dataset of spontaneous expressions by engaging lab subjects in different activities, such as playing an embarrassing game or experiencing an unpleasant smell. We believe acting out expressions is more fun for the participant than being subjected to unpleasant stimuli. New datasets of labeled examples of facial expressions that span a wide variety of people, capture conditions, and emotions are critical to the future of automated expression recognition and affective computing. Especially compared to bringing subjects to act out expressions in-person, online crowdsourcing has the potential to recruit many more subjects and collect far more data. The Meme Quiz is one of many possible ways to realize mechanics and incentives of crowdsourcing facial data. 3. DESIGN SPACE OF DATA-GATHERING GAMES AND SYSTEMS Before we describe our game, we want to define the design space of games with a purpose (GWAPs), specifically those used for gathering data, and position The Meme Quiz within that space. Games with a purpose, such as the ESP Game [25], were first introduced to produce large, labeled datasets to be used as training data for computer vision and machine learning tasks. Since then, games including BeFaced [22] and Motion Chain [20] have been developed to generate new data. We will call these games data-gathering games, with datageneration games as a subset. Paid crowdsourcing through micro-task platforms like Mechanical Turk are also a common way to gather and generate data. Not all games with a purpose are data-gathering games; some like Foldit [2], Phylo [9], and EteRNA [13] are about solving puzzles and understanding the human process of finding optimal solutions, rather than simply collecting a dataset. The Meme Quiz is ultimately about collecting a dataset, but also understanding the system’s learning process. In order to understand what makes ou

Ira Kemelmacher-Shlizerman | Kathleen Tuite | K. Tuite | Ira Kemelmacher-Shlizerman

[1] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2] Barbara S. Page. Hamlet on the Holodeck: The Future of Narrative in Cyberspace , 1999 .

[3] Dave Morris,et al. Game Architecture and Design with Cdrom , 1999 .

[4] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[5] Maja Pantic,et al. Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Shree K. Nayar,et al. FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[8] Takeo Kanade,et al. Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[9] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Takeo Kanade,et al. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[11] David Salesin,et al. The challenge of designing scientific discovery games , 2010, FDG.

[12] Zoran Popovic,et al. PhotoCity: training experts at large-scale image acquisition through a competitive game , 2011, CHI.

[13] Sandra B. Fan,et al. Picard: a creative and social online flashcard learning game , 2012, FDG.

[14] M. Blanchette,et al. Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[15] Ira Kemelmacher-Shlizerman,et al. Collection flow , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Ian Spiro,et al. Motion chain: a webcam game for crowdsourcing gesture collection , 2012, CHI Extended Abstracts.

[17] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Matti Pietikäinen,et al. A Spontaneous Micro-expression Database: Inducement, collection and baseline , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[19] Jonathan Krause,et al. Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Qi Wu,et al. CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21] Daniel McDuff,et al. Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected "In-the-Wild" , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22] Shaun J. Canavan,et al. BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[23] Yoshua Bengio,et al. Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[24] Andrew Marlton. Know your meme , 2013 .

[25] Minjae Lee,et al. RNA design rules from a massive open laboratory , 2014, Proceedings of the National Academy of Sciences.

[26] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Chek Tien Tan,et al. A game to crowdsource data for affective computing , 2014, FDG.