Measuring children's visual access to social information using face detection

Measuring children’s visual access to social information using face detection Michael C. Frank mcfrank@stanford.edu Department of Psychology Stanford University Abstract the child is attending to (Franchak, Kretch, Soska, Babcock, & Adolph, 2010). Of particular interest is the result, reported by Franchak et al. (2010), that 14-month-olds rarely fixated their mother’s face, even when she spoke to them directly. They looked in- stead at her hands or other parts of her body. The authors speculated that this result might have been due to the mother’s location, usually high above the child. When mothers were sitting down, their faces were much more visible to their chil- dren. In our current investigation we follow up on this sug- gestion, investigating the possibility that the posture of care- givers and the infant’s own posture work together to cause developmental changes in the accessibility of social informa- tion. The introduction of these new methods mean that for the first time, we can see what babies are looking at as they in- teract with—and learn from—the people around them. This development opens up many new questions for investigation. Yet work of this type is hindered by the tremendously slow and resource-intensive task of manually annotating videos, frame by frame. Up until now, only a few research groups have grappled with the task of how to analyze the massive datasets captured using these methods. The current study thus serves two purposes. First, it is de- signed to measure the accessibility of social information—in the form of faces—to infants. To investigate this question across development, we make use of a previously-described dataset (Aslin, 2009), in which a head-mounted camera recorded 2 – 3 hours of the visual experience of a single child at ages 3, 8, and 12 months (sample frames shown in Figure 1). Second, we investigate the possibility of using automated face detection to measure social information. It might in prin- ciple be possible to hand-annotate the presence of faces in each of the million-odd frames in our dataset (such annotation can be done around 4–8 times slower than real-time, yielding around 25–50 hours of total annotation time). For any larger study with more participants, annotation costs would quickly become prohibitive. Our study thus was designed to serve as proof-of-concept for the automated strategy. Detection of upright faces in static images is widely con- sidered to be a solved problem in computer vision, with the work of Viola and Jones (2004) providing a computationally- efficient solution that is now used in a wide variety of systems and consumer electronics. Nevertheless, the dataset we used presents a distinct set of challenges for such methods. In what follows, we describe our method for handling these chal- lenges using a collection of out-of-the-box techniques from computer vision and machine learning. We end by describing Other people are the most important source of information in a child’s life, and one important channel for social information is faces. Faces can convey affective, linguistic, and referential information through expressions, speech, and eye-gaze. But in order for children to apprehend this information, it must be accessible. How much of the time can children actually see the faces of the people around them? We use data from a head-mounted camera, in combination with face-detection methods from computer vision, to address this question in a scalable, automatic fashion. We develop a detection system using off-the-shelf methods and show that it produces robust results. Data from a single child’s visual experience suggest the possibility of systematic changes in the visibility of faces across the first year, possibly due to postural shifts. Keywords: Social development; face processing; head- camera. Introduction Faces are perhaps the most important source of social infor- mation for young children. Infants show a preference for faces and face-like configurations from birth (Johnson, Dz- iurawiec, Ellis, & Morton, 1991; Farroni et al., 2005), and they will fixate faces to the exclusion of nearly everything else when attending to complex naturalistic stimuli (Frank, Vul, & Johnson, 2009; Frank, Vul, & Saxe, 2011). By their first birthday, they are sensitive to facial information about emotion (Cohn & Tronick, 1983) and social group (Kelly et al., 2005), and they will readily follow gaze to an attended target (Scaife & Bruner, 1975). As they begin to speak and understand language, joint attention becomes a powerful cue for learning the meanings of words (Baldwin, 1991). To extract all of this important information in the natu- ral environment, infants and children must attend to people’s faces. Nearly all of what we know about children’s atten- tion to—and understanding of—faces comes from tightly- controlled lab experiments. In such experiments, the stimuli are typically presented in a very accessible format: at eye- level, large enough so that all details can be appreciated. How often do children actually see the faces of the people around them, though? And how often are the faces large enough to discern details from? Head-mounted cameras provide a new technique for mea- suring access to faces during development. While the method of placing a miniature camera on the head of an infant or young child is still relatively new, a number of investiga- tors have begun using it to record children’s first-person per- spective (Yoshida & Smith, 2008; Aslin, 2009; Smith, Yu, & Pereira, in press). Some studies have even used head- mounted eye-trackers to measure what part of the visual scene the child is fixating, a good proxy for what parts of the world

[1]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[2]  Linda B. Smith,et al.  Not your mother's view: the dynamics of toddler visual experience. , 2011, Developmental science.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  O. Pascalis,et al.  Three-month-olds, but not newborns, prefer own-race faces. , 2005, Developmental science.

[5]  Dare A. Baldwin,et al.  Infants' contribution to the achievement of joint reference. , 1991, Child development.

[6]  J. Bruner,et al.  The capacity for joint visual attention in the infant , 1975, Nature.

[7]  Michael C. Frank,et al.  Development of infants’ attention to faces during the first year , 2009, Cognition.

[8]  Richard N Aslin,et al.  How Infants View Natural Scenes Gathered From a Head-Mounted Camera , 2009, Optometry and vision science : official publication of the American Academy of Optometry.

[9]  Mark H. Johnson,et al.  Newborns' preference for face-relevant stimuli: effects of contrast polarity. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Richard N. Aslin,et al.  Correspondences between what infants see and know about causal and self-propelled motion , 2011, Cognition.

[11]  Jason S. Babcock,et al.  Head-mounted eye-tracking of infants' natural interactions: a new method , 2010, ETRA.

[12]  J. Cohn,et al.  Three-month-old infants' reaction to simulated maternal depression. , 1983, Child development.

[13]  Linda B. Smith,et al.  What's in View for Toddlers? Using a Head Camera to Study Visual Experience. , 2008, Infancy : the official journal of the International Society on Infant Studies.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Mark H. Johnson,et al.  Newborns' preferential tracking of face-like stimuli and its subsequent decline , 1991, Cognition.

[16]  Michael C. Frank,et al.  Measuring the Development of Social Attention Using Free-Viewing. , 2012, Infancy : the official journal of the International Society on Infant Studies.