Modeling Social Group Communication with Multi-Agent Imitation Learning

Toward enabling next-generation robots capable of socially intelligent interaction with humans, we present a $\mathbf{computational\; model}$ of interactions in a social environment of multiple agents and multiple groups. The Multiagent Group Perception and Interaction (MGpi) network is a deep neural network that predicts the appropriate social action to execute in a group conversation (e.g., speak, listen, respond, leave), taking into account neighbors' observable features (e.g., location of people, gaze orientation, distraction, etc.). A central component of MGpi is the Kinesic-Proxemic-Message (KPM) gate, that performs social signal gating to extract important information from a group conversation. In particular, KPM gate filters incoming social cues from nearby agents by observing their body gestures (kinesics) and spatial behavior (proxemics). The MGpi network and its KPM gate are learned via imitation learning, using demonstrations from our designed $\mathbf{social\; interaction\; simulator}$. Further, we demonstrate the efficacy of the KPM gate as a social attention mechanism, achieving state-of-the-art performance on the task of $\mathbf{group\; identification}$ without using explicit group annotations, layout assumptions, or manually chosen parameters.

[1]  A. Meltzoff,et al.  What imitation tells us about social cognition: a rapprochement between developmental psychology and cognitive neuroscience. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[3]  Jaehong Kim,et al.  Automatic Recognition of Children Engagement from Facial Video Using Convolutional Neural Networks , 2020, IEEE Transactions on Affective Computing.

[4]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[5]  Vittorio Murino,et al.  Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[7]  K. Albrecht Social Intelligence: The New Science of Success , 2005 .

[8]  Scott E. Hudson,et al.  Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alessio Del Bue,et al.  Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[11]  Travis J. Wiltshire,et al.  Toward understanding social cues and signals in human–robot interaction: effects of robot gaze and proxemic behavior , 2013, Front. Psychol..

[12]  Ana Paiva,et al.  Detecting Engagement in HRI: An Exploration of Social and Task-Based Context , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[13]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  S A Hillyard,et al.  Sensory gating as a physiological mechanism for visual selective attention. , 1987, Electroencephalography and clinical neurophysiology. Supplement.

[15]  Gavin Buckingham,et al.  Tactile gating in a reaching and grasping task , 2014, Physiological reports.

[16]  K. Shapiro,et al.  Personal names and the attentional blink: a visual "cocktail party" effect. , 1997, Journal of experimental psychology. Human perception and performance.

[17]  Cynthia Breazeal,et al.  Toward sociable robots , 2003, Robotics Auton. Syst..

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[20]  A. Mehrabian,et al.  Inference of attitudes from nonverbal communication in two channels. , 1967, Journal of consulting psychology.

[21]  Dirk Heylen,et al.  Annotating State of Mind in Meeting Data , 2006 .

[22]  Katia P. Sycara,et al.  Exploiting Robotic Swarm Characteristics for Adversarial Subversion in Coverage Tasks , 2017, AAMAS.

[23]  Marianne Schmid Mast,et al.  Dominance as expressed and inferred through speaking time: A meta-analysis , 2002 .

[24]  Cynthia Breazeal,et al.  Learning From and About Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots , 2005, Artificial Life.

[25]  Francesco Setti,et al.  Multi-scale f-formation discovery for group detection , 2013, 2013 IEEE International Conference on Image Processing.

[26]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[27]  Ben J. A. Kröse,et al.  Detecting F-formations as dominant sets , 2011, ICMI '11.

[28]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[29]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[30]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Marcello Pelillo,et al.  A Game-Theoretic Probabilistic Approach for Detecting Conversational Groups , 2014, ACCV.

[33]  Yisong Yue,et al.  Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[34]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[35]  Vittorio Murino,et al.  Social interactions by visual focus of attention in a three‐dimensional environment , 2013, Expert Syst. J. Knowl. Eng..

[36]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[37]  H. C. Cromwell,et al.  Sensory Gating: A Translational Effort from Basic to Clinical Science , 2008, Clinical EEG and neuroscience.

[38]  Bruce Edmonds,et al.  Social Intelligence , 1999, Computational and mathematical organization theory.

[39]  Elisa Ricci,et al.  Space speaks: towards socially and personality aware visual surveillance , 2010, MPVA '10.

[40]  Francesco Setti,et al.  F-Formation Detection: Individuating Free-Standing Conversational Groups in Images , 2015, PloS one.

[41]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[42]  Ronald C. Arkin,et al.  Acting Deceptively: Providing Robots with the Capacity for Deception , 2011, Int. J. Soc. Robotics.

[43]  Ray L. Birdwhistell,et al.  Introduction to kinesics : an annotation system for analysis of body motion and gesture , 1952 .

[44]  R Freedman,et al.  Neurobiological studies of sensory gating in schizophrenia. , 1987, Schizophrenia bulletin.

[45]  Hatice Gunes,et al.  Fully Automatic Analysis of Engagement and Its Relationship to Personality in Human-Robot Interactions , 2017, IEEE Access.

[46]  Silvio Savarese,et al.  DANTE: Deep Affinity Network for Clustering Conversational Interactants , 2019, ArXiv.

[47]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[48]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[50]  E. Hall,et al.  The Hidden Dimension , 1970 .

[51]  Ana Paiva,et al.  Automatic analysis of affective postures and body motion to detect engagement with a game companion , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[52]  Eliseo Ferrante,et al.  Swarm robotics: a review from the swarm engineering perspective , 2013, Swarm Intelligence.

[53]  David V. Pynadath,et al.  PsychSim: Agent-based Modeling of Social Interactions and Influence , 2004, ICCM.

[54]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[55]  Daniel Gatica-Perez,et al.  Analyzing Group Interactions in Conversations: a Review , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[56]  Martin Davies,et al.  Mental Simulation: Evaluations and Applications - Reading in Mind and Language , 1995 .

[57]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[58]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[59]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[60]  Kerstin Dautenhahn,et al.  Socially intelligent robots: dimensions of human–robot interaction , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.