Glance: rapidly coding behavioral video with the crowd

Behavioral researchers spend considerable amount of time coding video data to systematically extract meaning from subtle human actions and emotions. In this paper, we present Glance, a tool that allows researchers to rapidly query, sample, and analyze large video datasets for behavioral events that are hard to detect automatically. Glance takes advantage of the parallelism available in paid online crowds to interpret natural language queries and then aggregates responses in a summary view of the video data. Glance provides analysts with rapid responses when initially exploring a dataset, and reliable codings when refining an analysis. Our experiments show that Glance can code nearly 50 minutes of video in 5 minutes by recruiting over 60 workers simultaneously, and can get initial feedback to analysts in under 10 seconds for most clips. We present and compare new methods for accurately aggregating the input of multiple workers marking the spans of events in video data, and for measuring the quality of their coding in real-time before a baseline is established by measuring the variance between workers. Glance's rapid responses to natural language queries, feedback regarding question ambiguity and anomalies in the data, and ability to build on prior context in followup queries allow users to have a conversation-like interaction with their data - opening up new possibilities for naturally exploring video data.

[1]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  R. Bakeman,et al.  Sequential analysis and observational methods for the behavioral sciences , 2011 .

[4]  Henry A. Kautz,et al.  Real-time crowd labeling for deployable activity recognition , 2013, CSCW.

[5]  Roger Bakeman,et al.  Behavioral observation and coding. , 2000 .

[6]  Walter S. Lasecki,et al.  Answering visual questions with conversational crowd assistants , 2013, ASSETS.

[7]  Robert K. Tiemens Television's portrayal of the 1976 presidential debates: An analysis of visual content , 1978 .

[8]  Jaime Teevan,et al.  Information extraction and manipulation threats in crowd-powered systems , 2014, CSCW.

[9]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[10]  Badrish Chandramouli,et al.  Stat!: an interactive analytics environment for big data , 2013, SIGMOD '13.

[11]  Daniel Casasanto,et al.  Good and Bad in the Hands of Politicians: Spontaneous Gestures during Positive and Negative Speech , 2010, PloS one.

[12]  Jeffrey Nichols,et al.  Chorus: a crowd-powered conversational assistant , 2013, UIST.

[13]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[14]  Daniela Giordano,et al.  A crowdsourcing approach to support video annotation , 2013, VIGTA@ICVS.

[15]  Eric Horvitz,et al.  Crowdsourcing General Computation , 2011 .

[16]  H. Reis,et al.  Handbook of Research Methods in Social and Personality Psychology: Author Index , 2013 .

[17]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[18]  Philip Smith Roger Bakeman John M. Gottman , 1987, Animal Behaviour.

[19]  Philip L. Smith,et al.  Quantitative Coding of Negotiation Behavior , 2004 .

[20]  Walter S. Lasecki,et al.  Crowd Memory: Learning in the Collective , 2012, ArXiv.

[21]  Brandon Burr,et al.  VACA: a tool for qualitative video analysis , 2006, CHI Extended Abstracts.

[22]  Walter S. Lasecki,et al.  Finding dependencies between actions using the crowd , 2014, CHI.

[23]  Rob Miller,et al.  Real-time crowd control of existing interfaces , 2011, UIST.

[24]  Daniel L. Schwartz,et al.  Prototyping dynamics: sharing multiple designs improves exploration, group rapport, and results , 2011, CHI.

[25]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[26]  Austin Henderson,et al.  Interaction Analysis: Foundations and Practice , 1995 .

[27]  Michael S. Bernstein,et al.  Crowds in two seconds: enabling realtime crowd-powered interfaces , 2011, UIST.

[28]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[29]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[30]  Karrie Karahalios,et al.  A3: a coding guideline for HCI+autism research using video annotation , 2008, Assets '08.

[31]  John J. B. Allen,et al.  The handbook of emotion elicitation and assessment , 2007 .

[32]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[33]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[34]  Karrie Karahalios,et al.  VCode and VData: illustrating a new framework for supporting the video annotation workflow , 2008, AVI '08.