Understanding Public Speakers' Performance: First Contributions to Support a Computational Approach

Communication is part of our everyday life and our ability to communicate can have a significant role in a variety of contexts in our personal, academic, and professional lives. For long, the characterization of what is a good communicator has been subject to research and debate by several areas, particularly in Education, with a focus on improving the performance of teachers. In this context, the literature suggests that the ability to communicate is not only defined by the verbal component, but also by a plethora of non-verbal contributions providing redundant or complementary information, and, sometimes, being the message itself. However, even though we can recognize a good or bad communicator, objectively, little is known about what aspects – and to what extent—define the quality of a presentation. The goal of this work is to create the grounds to support the study of the defining characteristics of a good communicator in a more systematic and objective form. To this end, we conceptualize and provide a first prototype for a computational approach to characterize the different elements that are involved in communication, from audiovisual data, illustrating the outcomes and applicability of the proposed methods on a video database of public speakers.

[1]  Markus Koppensteiner,et al.  Moving speeches: Dominance, trustworthiness and competence in body motion , 2016 .

[2]  Turgut Özseven,et al.  SPeech ACoustic (SPAC): A novel tool for speech feature extraction and classification , 2018, Applied Acoustics.

[3]  Lei Chen,et al.  Designing An Automated Assessment of Public Speaking Skills Using Multimodal Cues , 2016, J. Learn. Anal..

[4]  Amy J. C. Cuddy,et al.  Power Posing , 2010, Psychological science.

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[7]  Xavier Ochoa,et al.  Presentation Skills Estimation Based on Video and Kinect Data Analysis , 2014, MLA@ICMI.

[8]  Angela D. Friederici,et al.  Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition , 2014, NeuroImage.

[9]  Bernt Schiele,et al.  ArtTrack: Articulated Multi-Person Tracking in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Peter Hagoort,et al.  Social eye gaze modulates processing of speech and co-speech gesture , 2014, Cognition.

[11]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[12]  Maja Pantic,et al.  Local Evidence Aggregation for Regression-Based Facial Point Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Daniel W. Heck,et al.  A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors: the case of felt power , 2017 .

[14]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[15]  Joon-Hyuk Chang,et al.  Dempster-Shafer theory for enhanced statistical model-based voice activity detection , 2018, Comput. Speech Lang..

[16]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[17]  K. Bard,et al.  A Cross-species Comparison of Facial Morphology and Movement in Humans and Chimpanzees Using the Facial Action Coding System (FACS) , 2007, Journal of nonverbal behavior.

[18]  Francesco Ianì,et al.  Mechanisms underlying the beneficial effect of a speaker’s gestures on the listener , 2017 .

[19]  Yang Liu,et al.  Meaningful head movements driven by emotional synthetic speech , 2017, Speech Commun..

[20]  Mohan S. Kankanhalli,et al.  Multi-sensor Self-Quantification of Presentations , 2015, ACM Multimedia.

[21]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[22]  Andrew Hines,et al.  Perception and prediction of speaker appeal - A single speaker study , 2018, Comput. Speech Lang..

[23]  Stefanos Zafeiriou,et al.  Menpo: A Comprehensive Platform for Parametric Image Alignment and Visual Deformable Models , 2014, ACM Multimedia.

[24]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Anshul Sharma,et al.  A method to infer emotions from facial Action Units , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[27]  Mohammed E. Hoque,et al.  AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms , 2016, IUI.

[28]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[29]  Susan Goldin-Meadow,et al.  Manual directional gestures facilitate cross-modal perceptual learning , 2019, Cognition.