The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

[1]  Fabio Valente,et al.  Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and gaussian processes , 2012, ACM Multimedia.

[2]  Dietmar Todt,et al.  Laughter in Conversation: Features of Occurrence and Acoustic Structure , 2004 .

[3]  K. Scherer,et al.  Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. , 2012, Emotion.

[4]  Emily Tucker Prud'hommeaux,et al.  Computational prosodic markers for autism. , 2010, Autism : the international journal of research and practice.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Björn W. Schuller,et al.  Medium-term speaker states - A review on intoxication, sleepiness and the first challenge , 2014, Comput. Speech Lang..

[7]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[8]  Björn Schuller,et al.  Computational Paralinguistics , 2013 .

[9]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[10]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[11]  Shrikanth S. Narayanan,et al.  Spontaneous-Speech Acoustic-Prosodic Features of Children with Autism and the Interacting Psychologist , 2012, INTERSPEECH.

[12]  Björn Schuller,et al.  The Computational Paralinguistics Challenge , 2012 .

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Steve Young,et al.  The HTK book , 1995 .

[15]  Antonio Origlia,et al.  From Nonverbal Cues to Perception: Personality and Social Attractiveness , 2011, COST 2102 Training School.

[16]  Fabien Ringeval,et al.  Automatic Intonation Recognition for the Prosodic Assessment of Language-Impaired Children , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  H. H. Clark,et al.  Using uh and um in spontaneous speaking , 2002, Cognition.

[18]  Mert Bay,et al.  The Music Information Retrieval Evaluation eXchange: Some Observations and Insights , 2010, Advances in Music Information Retrieval.

[19]  Alvin F. Martin,et al.  NIST speaker recognition evaluation chronicles , 2004, Odyssey.

[20]  Nick Campbell,et al.  Acoustic Features of Four Types of Laughter in Natural Conversational Speech , 2011, ICPhS.

[21]  Thomas Mandl,et al.  Recent Developments in the Evaluation of Information Retrieval Systems: Moving Towards Diversity and Practical Relevance , 2008, Informatica.

[22]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[23]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[24]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[25]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[26]  Sue Peppé,et al.  Prosody in autism spectrum disorders: a critical review. , 2003, International journal of language & communication disorders.

[27]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[28]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[29]  Kenneth Tobin,et al.  Solidarity and conflict: aligned and misaligned prosody as a transactional resource in intra- and intercultural communication involving power differences , 2009 .