Automatically Classifying User Engagement for Dynamic Multi-party Human–Robot Interaction

A robot agent designed to engage in real-world human–robot joint action must be able to understand the social states of the human users it interacts with in order to behave appropriately. In particular, in a dynamic public space, a crucial task for the robot is to determine the needs and intentions of all of the people in the scene, so that it only interacts with people who intend to interact with it. We address the task of estimating the engagement state of customers for a robot bartender based on the data from audiovisual sensors. We begin with an offline experiment using hidden Markov models, confirming that the sensor data contains the information necessary to estimate user state. We then present two strategies for online state estimation: a rule-based classifier based on observed human behaviour in real bars, and a set of supervised classifiers trained on a labelled corpus. These strategies are compared in offline cross-validation, in an online user study, and through validation against a separate test corpus. These studies show that while the trained classifiers are best in a cross-validation setting, the rule-based classifier performs best with novel data; however, all classifiers also change their estimate too frequently for practical use. To address this issue, we present a final classifier based on Conditional Random Fields: this model has comparable performance on the test data, with increased stability. In summary, though, the rule-based classifier shows competitive performance with the trained classifiers, suggesting that for this task, such a simple model could actually be a preferred option, providing useful online performance while avoiding the implementation and data-scarcity issues involved in using machine learning for this task.

[1]  Oliver Lemon,et al.  A nonparametric Bayesian approach to learning multimodal interaction management , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[4]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Sebastian Loth,et al.  Automatic detection of service initiation signals used in bars , 2013, Front. Psychol..

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Brian Scassellati,et al.  Comparing Models of Disengagement in Individual and Group Interactions , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Maria Pateraki,et al.  Comparing task-based and socially intelligent behaviour in a robot bartender , 2013, ICMI '13.

[9]  Ronald P. A. Petrick,et al.  EXPERIENCES WITH PLANNING FOR NATURAL LANGUAGE GENERATION , 2011, Comput. Intell..

[10]  Ronald P. A. Petrick,et al.  Planning for Social Interaction with Sensor Uncertainty , 2014 .

[11]  Kerstin Dautenhahn,et al.  Socially intelligent robots: dimensions of human–robot interaction , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  Panos Trahanias,et al.  Visual human-robot communication in social settings , 2013 .

[13]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[14]  Oliver Lemon,et al.  Training and evaluation of an MDP model for social multi-user human-robot interaction , 2013, SIGDIAL Conference.

[15]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[16]  Maria Pateraki,et al.  Visual tracking of hands, faces and facial features of multiple persons , 2012, Machine Vision and Applications.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  R. Allan Reese,et al.  Linear Mixed Models: a Practical Guide using Statistical Software , 2008 .

[19]  Elena Torta,et al.  Socially Assistive Robots: A Comprehensive Approach to Extending Independent Living , 2013, International Journal of Social Robotics.

[20]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[21]  Jan Peter De Ruiter,et al.  Insights from the bar: A model of interaction , 2012 .

[22]  Eric Horvitz,et al.  Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings , 2009, SIGDIAL Conference.

[23]  Alois Knoll,et al.  Social behavior recognition using body posture and head pose for human-robot interaction , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Goldie Nejat,et al.  Affect detection from body language during social HRI , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[25]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[26]  Ronald P. A. Petrick,et al.  Planning for Social Interaction in a Robot Bartender Domain , 2013, ICAPS.

[27]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[28]  Jose I. Figueroa-Angulo,et al.  Compound Hidden Markov Model for Activity Labelling , 2015 .

[29]  Christian Wolf,et al.  Social Behavior Modeling Based on Incremental Discrete Hidden Markov Models , 2013, HBU.

[30]  Björn Granström,et al.  Multimodality in Language and Speech Systems , 2002 .

[31]  Kazuhiro Otsuka Conversation Scene Analysis [Social Sciences] , 2011, IEEE Signal Processing Magazine.

[32]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[33]  Qianli Xu,et al.  Attention-based addressee selection for service and social robots to interact with multiple persons , 2012, WASA '12.

[34]  Mary Ellen Foster Validating Attention Classifiers for Multi-Party Human-Robot Interaction , 2014 .

[35]  H. Bekkering,et al.  Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[36]  Elena Torta,et al.  How Can a Robot Attract the Attention of Its Human Partner? A Comparative Study over Different Modalities for Attracting Attention , 2012, ICSR.

[37]  Eric Horvitz,et al.  Dialog in the open world: platform and applications , 2009, ICMI-MLMI '09.

[38]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[39]  Alois Knoll,et al.  Modelling State of Interaction from Head Poses for Social Human-Robot Interaction , 2012, HRI 2012.

[40]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[41]  Maria Pateraki,et al.  Two people walk into a bar: dynamic multi-party social interaction with a robot agent , 2012, ICMI '12.

[42]  Min Wu,et al.  Emotion-Age-Gender-Nationality Based Intention Understanding in Human–Robot Interaction Using Two-Layer Fuzzy Support Vector Regression , 2015, International Journal of Social Robotics.

[43]  Cynthia Breazeal,et al.  Socially intelligent robots: research, development, and applications , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[44]  Oliver Lemon,et al.  Towards Action Selection Under Uncertainty for a Socially Aware Robot Bartender , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[45]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[46]  B. Granström,et al.  NATURAL TURN-TAKING NEEDS NO MANUAL : COMPUTATIONAL THEORY AND MODEL , FROM PERCEPTION TO ACTION , 2002 .

[47]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[48]  Hiroshi Murase,et al.  Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[49]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Prasun Dewan,et al.  Engagement analysis through computer vision , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[51]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[52]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[53]  Dirk Heylen,et al.  Listening Heads , 2006, ZiF Workshop.

[54]  Agata Rozga,et al.  Using electrodermal activity to recognize ease of engagement in children during social interactions , 2014, UbiComp.

[55]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[56]  Amy Isard,et al.  Social State Recognition and Knowledge-Level Planning for Human-Robot Interaction in a Bartender Domain , 2012 .

[57]  Kristinn R. Thórisson,et al.  Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action , 2002 .

[58]  Sean Andrist,et al.  Designing effective gaze mechanisms for virtual agents , 2012, CHI.

[59]  Ana Paiva,et al.  Detecting Engagement in HRI: An Exploration of Social and Task-Based Context , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[60]  Paul Davidsson,et al.  Quantifying the Impact of Learning Algorithm Parameter Tuning , 2006, AAAI.

[61]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[62]  Manuel Giuliani,et al.  How can i help you': comparing engagement classification strategies for a robot bartender , 2013, ICMI '13.

[63]  Manuel Giuliani,et al.  Ghost-in-the-Machine reveals human social signals for human–robot interaction , 2015, Front. Psychol..

[64]  P. Trahanias,et al.  Visual tracking of hands , faces and facial features as a basis for human-robot communication , 2011 .

[65]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[66]  Laurel D. Riek,et al.  Joint action perception to enable fluent human-robot teamwork , 2015, 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[67]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..