A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety

The ability to efficiently speak in public is an essential asset for many professions and is used in everyday life. As such, tools enabling the improvement of public speaking performance and the assessment and mitigation of anxiety related to public speaking would be very useful. Multimodal interaction technologies, such as computer vision and embodied conversational agents, have recently been investigated for the training and assessment of interpersonal skills. Once central requirement for these technologies is multimodal corpora for training machine learning models. This paper addresses the need of these technologies by presenting and sharing a multimodal corpus of public speaking presentations. These presentations were collected in an experimental study investigating the potential of interactive virtual audiences for public speaking training. This corpus includes audio-visual data and automatically extracted features, measures of public speaking anxiety and personality, annotations of participants’ behaviors and expert ratings of behavioral aspects and overall performance of the presenters. We hope this corpus will help other research teams in developing tools for supporting public speaking training.

[1]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[2]  Chris Barker,et al.  An Experiment on Public Speaking Anxiety in Response to Three Different Types of Virtual Audience , 2002, Presence: Teleoperators & Virtual Environments.

[3]  Torsten Wörtwein,et al.  Multimodal Public Speaking Performance Assessment , 2015, ICMI.

[4]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[5]  Paul A. Cairns,et al.  Measuring and defining the experience of immersion in games , 2008, Int. J. Hum. Comput. Stud..

[6]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  I R Titze,et al.  Vocal intensity in speakers and singers. , 1991, The Journal of the Acoustical Society of America.

[8]  Graham D. Bodie,et al.  A Racing Heart, Rattling Knees, and Ruminative Thoughts: Defining, Explaining, and Treating Public Speaking Anxiety , 2010 .

[9]  Catherine Pelachaud,et al.  Towards a Socially Adaptive Virtual Agent , 2015, IVA.

[10]  John Kane,et al.  Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Torsten Wörtwein,et al.  Exploring feedback strategies to improve public speaking: an interactive virtual audience framework , 2015, UbiComp.

[12]  Johannes Schöning,et al.  Augmenting Social Interactions: Realtime Behavioural Feedback using Social Signal Processing Techniques , 2015, CHI.

[13]  John Kane,et al.  Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform , 2011, INTERSPEECH.

[14]  M. North,et al.  Virtual Reality Therapy: an Effective Treatment for the Fear of Public Speaking , 1998, Int. J. Virtual Real..

[15]  O. John,et al.  Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German , 2007 .

[16]  Lisa M. Schreiber,et al.  The Development and Test of the Public Speaking Competence Rubric , 2012 .

[17]  Bilge Mutlu,et al.  MACH: my automated conversation coach , 2013, UbiComp.

[18]  Louis-Philippe Morency,et al.  Cicero - Towards a Multimodal Virtual Audience Platform for Public Speaking Training , 2013, IVA.

[19]  Mark G. Core,et al.  Learning intercultural communication skills with virtual humans: Feedback and fidelity. , 2013 .

[20]  Clifford A. Smith,et al.  A short-form of the Personal Report of Confidence as a Speaker , 2008 .

[21]  Shihong Lao,et al.  Vision-Based Face Understanding Technologies and Their Applications , 2004, SINOBIOMETRICS.

[22]  J. Henry,et al.  The positive and negative affect schedule (PANAS): construct validity, measurement properties and normative data in a large non-clinical sample. , 2004, The British journal of clinical psychology.

[23]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  T. Furmark,et al.  Social phobia subtypes in the general population revealed by cluster analysis , 2000, Psychological Medicine.

[25]  Peter Wittenburg,et al.  Annotation by Category: ELAN and ISO DCR , 2008, LREC.

[26]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[27]  Julia Hirschberg,et al.  Acoustic/prosodic and lexical correlates of charismatic speech , 2005, INTERSPEECH.

[28]  Stefan Scherer,et al.  A Multimodal Predictive Model of Successful Debaters or How I Learned to Sway Votes , 2015, ACM Multimedia.

[29]  Mohammed E. Hoque,et al.  Rhema: A Real-Time In-Situ Intelligent Interface to Help People with Public Speaking , 2015, IUI.

[30]  John Kane,et al.  An audiovisual political speech analysis incorporating eye-tracking and perception data , 2012, LREC.