Social signal processing: state-of-the-art and future perspectives of an emerging domain

The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like politeness, and disagreement - in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.

[1]  Kazutaka Hirata,et al.  Memory cues for meeting video retrieval , 2004, CARPE'04.

[2]  Wayne Wolf,et al.  Real-time posture and activity recognition , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[3]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  H. Giles,et al.  Accommodation theory: Communication, context, and consequence. , 1991 .

[5]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[6]  Jeffrey F. Cohn,et al.  Foundations of human computing: facial expression and emotion , 2006, ICMI '06.

[7]  A. Scheflen THE SIGNIFICANCE OF POSTURE IN COMMUNICATION SYSTEMS. , 1964, Psychiatry.

[8]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[9]  Heinrich H. Bülthoff,et al.  The components of conversational facial expressions , 2004, APGV '04.

[10]  Rosalind W. Picard,et al.  Automated Posture Analysis for Detecting Learner's Interest Level , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[11]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[12]  Elizabeth Shriberg,et al.  Phonetic Consequences of Speech Disfluency , 1999 .

[13]  J B Cortés,et al.  Physique and self-description of temperament. , 1965, Journal of consulting psychology.

[14]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[15]  Alex Pentland,et al.  Human-Centred Intelligent Human-Computer Interaction (HCI2): how far are we from attaining it? , 2008, Int. J. Auton. Adapt. Commun. Syst..

[16]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[18]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Daniel Gatica-Perez,et al.  Detection and application of influence rankings in small group meetings , 2006, ICMI '06.

[20]  Gwen Littlewort,et al.  Faces of pain: automated measurement of spontaneousallfacial expressions of genuine and posed pain , 2007, ICMI '07.

[21]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[22]  Steve Renals,et al.  Automatic Meeting Segmentation Using Dynamic Bayesian Networks , 2007, IEEE Transactions on Multimedia.

[23]  木村 和夫 Pragmatics , 1997, Language Teaching.

[24]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[25]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  A. Pentland Social Dynamics: Signals and Behavior , 2004 .

[27]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[28]  M. Knapp,et al.  Nonverbal communication in human interaction , 1972 .

[29]  Nadia Bianchi-Berthouze,et al.  Modeling human affective postures: an information theoretic characterization of posture features , 2004, Comput. Animat. Virtual Worlds.

[30]  Stephen Webster,et al.  Evaluating the Relation of Vocal Accommodation in Conversation Partners' Fundamental Frequencies to Perceptions of Communication Quality , 1997 .

[31]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[32]  Akinori Ito,et al.  Smile and laughter recognition using speech processing and face recognition from conversation video , 2005, 2005 International Conference on Cyberworlds (CW'05).

[33]  T. Chartrand,et al.  The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[34]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[35]  R. Cronje Gladwell M. Blink: the power of thinking without thinking. Boston: Little, Brown, 2005. , 2005 .

[36]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[37]  Jennifer S. Beer,et al.  Facial expression of emotion. , 2003 .

[38]  George Psathas,et al.  Conversation Analysis: The Study of Talk-in-Interaction , 1994 .

[39]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[40]  Fabio Pianesi,et al.  Automatic detection of group functional roles in face to face interactions , 2006, ICMI '06.

[41]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[42]  Alexander I. Rudnicky,et al.  Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants , 2004, INTERSPEECH.

[43]  Parham Aarabi,et al.  The automatic measurement of facial beauty , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[44]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[45]  A. Mehrabian,et al.  Inference of attitudes from nonverbal communication in two channels. , 1967, Journal of consulting psychology.

[46]  Catherine Pelachaud,et al.  Embodied contextual agent in information delivering application , 2002, AAMAS '02.

[47]  L. Smith-Lovin,et al.  INTERRUPTIONS IN GROUP DISCUSSIONS: THE EFFECTS OF GENDER AND GROUP COMPOSITION* , 1989 .

[48]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[49]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[50]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[51]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[52]  J. Mccroskey,et al.  Nonverbal Behavior in Interpersonal Relations , 1987 .

[53]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[54]  H. L. Tischler Introduction to Sociology, 3rd Edition , 1990 .

[55]  A. Pentland Social Signal Processing [Exploratory DSP] , 2007, IEEE Signal Processing Magazine.

[56]  Michael C. Horsch,et al.  Dynamic Bayesian networks , 1990 .

[57]  Andrea Kleinsmith,et al.  Cross-cultural differences in recognizing affect from body posture , 2006, Interact. Comput..

[58]  A.J O'Toole,et al.  3D shape and 2D surface textures of human faces: the role of "averages" in attractiveness and age , 1999, Image Vis. Comput..

[59]  Ikuo Daibo,et al.  Interactional Synchrony in Conversations about Emotional Episodes: A Measurement by “the Between-Participants Pseudosynchrony Experimental Paradigm” , 2006 .

[60]  A. Pentland,et al.  Thin slices of negotiation: predicting outcomes from conversational dynamics within the first 5 minutes. , 2007, The Journal of applied psychology.

[61]  C. Nass,et al.  Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. , 2001, Journal of experimental psychology. Applied.

[62]  Eytan Ruppin,et al.  Facial Attractiveness: Beauty and the Machine , 2006, Neural Computation.

[63]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[64]  D. Keltner,et al.  Social Functions of Emotions at Four Levels of Analysis , 1999 .

[65]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[66]  R. Collier Prosodic Systems and Intonation in English , 1969 .

[67]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[68]  Maja Pantic,et al.  B-spline polynomial descriptors for human activity recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[69]  N. Campbell,et al.  Conversational speech synthesis and the need for some laughter , 2005, IEEE Transactions on Audio, Speech, and Language Processing.

[70]  Patrick E. Shrout,et al.  Nonverbal behaviors and social evaluation. , 1981 .

[71]  Maja Pantic,et al.  Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[72]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[73]  E. Hall The Silent Language , 1959 .

[74]  T. Chartrand,et al.  The Chameleon Effect as Social Glue: Evidence for the Evolutionary Significance of Nonconscious Mimicry , 2003 .

[75]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Amanda C.C. Williams,et al.  Facial expression of pain: An evolutionary account , 2002, Behavioral and Brain Sciences.

[77]  Jeremy N. Bailenson,et al.  Detecting digital chameleons , 2008, Comput. Hum. Behav..

[78]  Alessandro Vinciarelli,et al.  Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution Modeling , 2007, IEEE Transactions on Multimedia.

[79]  Brigitte Zellner,et al.  Pauses and the temporal structure of speech , 1995 .

[80]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[81]  A. Pentland Automatic mapping and modeling of human networks , 2007 .

[82]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[83]  Jean-Marc Odobez,et al.  Using audio and video features to classify the most dominant person in a group meeting , 2007, ACM Multimedia.

[84]  Alex Pentland,et al.  Using the influence model to recognize functional roles in meetings , 2007, ICMI '07.

[85]  M. Bartlett,et al.  Machine Analysis of Facial Expressions , 2007 .

[86]  Thomas V. Merluzzi,et al.  Cognitive assessment of social anxiety: Development and validation of a self-statement questionnaire , 1982, Cognitive Therapy and Research.

[87]  E. Berscheid,et al.  What is beautiful is good. , 1972, Journal of personality and social psychology.

[88]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[89]  Alex Pentland Socially Aware Computation and Communication , 2005, Computer.

[90]  K. Albrecht Social Intelligence: The New Science of Success , 2005 .

[91]  Spike Cramphorn Blink: The Power of Thinking without Thinking / Strangers to Ourselves: Discovering the Adaptive Unconscious , 2006, Journal of Advertising Research.

[92]  Yi Li,et al.  Human posture recognition using multi-scale morphological method and Kalman motion estimation , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[93]  Sharon L. Oviatt,et al.  Toward adaptive conversational interfaces: Modeling speech convergence with animated personas , 2004, TCHI.

[94]  Andreas Stolcke,et al.  Observations on overlap: findings and implications for automatic processing of multi-party conversation , 2001, INTERSPEECH.

[95]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[96]  Wei-Ta Chu,et al.  Movie Analysis Based on Roles' Social Network , 2007, 2007 IEEE International Conference on Multimedia and Expo.