Ruby: A Robotic Platform for Real-time Social Interaction

The majority of our waking hours are spent engaging in social interactions. Some of these interactions occur at the level of long-term strategic planning while others take place at faster time scales, such as in conversations or card games. The abilityto perceive subtle gestural, postural, and facial cues, in addition to verbal language, in real-time is a critical component. An understanding of the underlying perceptual primitives that support this kind of real-time social cognition is key to understanding social development. Robots present an ideal opportunity to study the development of social interaction in infants [Fasel,Deak,Triesch,Movellan 2002]. It is possible to create robots that exhibit precisely controlled contingency structures. By observing how infants interact with these robots we gain an opportunity to understand how infants identify the operating characteristics of the social agents with whom they interact. We have recently developed a social interaction robot, "Ruby", designed to communicate with children. Ruby is endowed with the following real-time perceptual primitives to facilitate social interaction: face tracking, motor control and speech detection. It communicates via head and eye movements and we have recently run pilot studies indicating that Ruby is fun and non-threatening to children. Ruby's face tracking system consist of 3 cues taken from 3 inputs. The first 2 inputs are high-resolution pan-tilt-zoom color cameras which are the "eyes". The third input is an omni-directional camera acting as Ruby's peripheral vision. Each eye uses the MPLab's contrast-feature based frontal face finder [Fasel et al CVIU2004] and adaptive color-based tracker [Ishiguro et al 2003] [Hershey et al CVPR2004]. Ruby combines both of these to find both frontal and rotated faces at more than 30 frames per second. Ruby's motor control system currently has 3 components; neck control, eye control, and control of external objects for experiments. Ruby also features speech detection [Pellom 2004] and response with variable delay parameters. We are now adding eye and eye-blink detection[Fasel et al CVIU2004], expression recognition[Littlewort-Ford, Bartlett et al 2004], recognition of common communicative words in English, arm movements, finger pointing, and touch sensors. We hope to use Ruby to collect and analyze data on social interaction and contingency and on the development of social interaction in infants.