Detecting Hands in Children's Egocentric Views to Understand Embodied Attention during Social Interaction

Detecting Hands in Children’s Egocentric Views to Understand Embodied Attention during Social Interaction Sven Bambach†, John M. Franchak, David J. Crandall†, Chen Yu {sbambach, jmfranch, djcran, chenyu}@indiana.edu † School of Informatics and Computing, Indiana University Department of Psychological and Brain Sciences, Indiana University Bloomington, IN, 47405 USA Abstract 2011). In more natural interactions, there are multiple objects competing for attention, various manual actions toward those toy objects, and spontaneous goals. Visual attention changes from moment to moment according to the child’s own actions and the parent’s actions toward the child and objects. Though complex, these are the contexts in which real-world learning occurs. Compared with adults, young children’s attentional systems may be even more tied to bodily actions. The goal of the present study is to understand how sensory- motor behavior supports effective visual attention in toddlers. Towards this goal, we developed a more naturalistic exper- imental paradigm in which a child and parent wear head- mounted eye trackers while freely engaged with a set of toys. Each eye tracking system captures egocentric video from a first person perspective as well as gaze direction in the first- person view. In this way, we precisely measure the visual attention of both the parent and child, and also their manual actions. Recent findings using the same paradigm show that in toy play, both children and parents visually attend to not only the objects held by oneself but also the objects held by the social partner (Yu & Smith, 2013); in doing so, they create and maintain coordinated visual attention by looking at the same object at the same time. The target object is likely to be held by child or parent. Similarly, other work has shown that by holding objects, parents increase the likelihood that infants will look at parents’ hands (Franchak et al., 2011). These re- sults suggest the important role of hands and hand activities (of both children and parents) in toddlers’ visual attention. Given previous findings, the present study focuses on pro- viding new evidence on how eye and hand actions interact to support effective visual attention to objects in toddlers. We first describe a new method to automatically detect hands and faces in egocentric video, allowing us to locate (at a pixel level) both one’s own hands and the social partner’s hands in the first person view. Next, we report a series of results that link hands and hand actions with visual attention, to show how the child’s and parent’s hands contribute to visual infor- mation selection in the child’s view. Understanding visual attention in children could yield insight into how the visual system develops during formative years and how children’s overt attention plays a role in development and learning. We are particularly interested in the role of hands and hand activities in children’s visual attention. We use head- mounted cameras to collect egocentric video and eye gaze data of toddlers during playful social interaction with their parents, and developed a computer vision system to track and label dif- ferent hands within the child’s field of view. We report detailed results on appearance frequencies and spatial distributions of parents’ and children’s hands both in the child’s field of view and as the target of the child’s attentional fixation. Keywords: Attention; Development; Eye tracking; Vision Introduction The visual world is cluttered with objects and events gener- ated by oneself and others. To efficiently process a cluttered and complex visual world, perceptual and cognitive systems must selectively attend to a subset of this information. Atten- tion can be viewed as a spatial spotlight (Posner, 1980) that can be implemented both internally and externally. Although adults can attend to a location outside the area targeted by eye gaze (Shepherd, Findlay, & Hockey, 1986), attention is often tied to the body and sensory-motor behaviors — adults typically orient gaze direction to coincide with the focus of the attentional spotlight. Studies of adults engaged in com- plex tasks from making sandwiches to copying block patterns (Ballard, Hayhoe, Pook, & Rao, 1997; Hayhoe & Ballard, 2005) suggest that the momentary disposition of the body in space serves as a deictic (pointing) reference for binding sen- sory objects to internal computations (Ballard et al., 1997; Spivey, Tyler, Richardson, & Young, 2000). These studies analyzed the coordination of eye, head, and hands by mea- suring multiple streams of behavior in free-flowing tasks with multiple goals and targets for attention. Attention and information selection are critical in early de- velopment and learning (Mundy & Newell, 2007) as early attention is predictive of later developmental outcomes (Ruff & Rothbart, 1996). Most studies of the development of at- tention employ highly-controlled experimental tasks in the laboratory. Many studies use remote eye tracking systems to measure looking behaviors, revealing much about the vi- sual attention of toddlers as they passively examine visual stimuli displayed on a computer screen. However, more re- cent studies using head-mounted eye tracking have addressed visual selection in freely-moving toddlers when they are en- gaged in everyday tasks (Franchak, Kretch, Soska, & Adolph, Experiment To realize our overall goal of measuring visual attention in natural interactions, we developed a multi-modal sensing sys- tem that allows us to capture a wide variety of video and sens- ing data from participants in our lab.

[1]  J. Findlay,et al.  The Relationship between Eye Movements and Spatial Attention , 1986, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[2]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Linda B. Smith,et al.  Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination , 2013, PloS one.

[4]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[5]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[6]  P. Mundy,et al.  CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Attention, Joint Attention, and Social Cognition , 2022 .

[7]  Shimon Ullman,et al.  From simple innate biases to complex visual concepts , 2012, Proceedings of the National Academy of Sciences.

[8]  Chen Yu,et al.  Understanding embodied visual attention in child-parent interaction , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[9]  T. Foulsham,et al.  The where, what and when of gaze allocation in the lab and the natural environment , 2011, Vision Research.

[10]  Kasey C. Soska,et al.  Head-mounted eye tracking: a new method to describe infant looking. , 2011, Child development.

[11]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[13]  M. Rothbart,et al.  Attention in Early Development: Themes and Variations , 1996 .

[14]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[15]  Daniel C. Richardson,et al.  Eye Movements During Comprehension of Spoken Scene Descriptions , 2000 .

[16]  Dare A. Baldwin,et al.  Infants recognize similar goals across dissimilar actions involving object manipulation , 2011, Cognition.