Speaker impact on audience comprehension for academic presentations

Understanding audience comprehension levels for presentations has the potential to enable richer and more focused interaction with audio-visual recordings. We describe an investigation into automated analysis and classification of audience comprehension during academic presentations. We identify audio and visual features considered to most aid audience understanding. To obtain gold standards for comprehension levels, human annotators watched contiguous video segments from a corpus of academic presentations and estimated how much they understood or comprehended the content. We investigate pre-fusion and post-fusion strategies over a number of input streams and demonstrate the most effective modalities for classification of comprehension. We demonstrate that it is possible to build a classifier to predict potential audience comprehension levels, obtaining accuracy over a 7-class range of 52.9%, and over a binary classification problem to 85.4%.

[1]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[3]  Louis-Philippe Morency,et al.  Cicero - Towards a Multimodal Virtual Audience Platform for Public Speaking Training , 2013, IVA.

[4]  Michael Alley,et al.  How the design of presentation slides affects audience comprehension: A case for the assertion-evidence approach , 2013 .

[5]  Shiri Lev-Ari,et al.  Comprehending non-native speakers: theory and evidence for adjustment in manner of processing , 2014, Front. Psychol..

[6]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[7]  Magnus Haake,et al.  The slower the better? Does the speaker's speech rate influence children's performance on a language comprehension test? , 2014, International journal of speech-language pathology.

[8]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[9]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[10]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[11]  Javier Lorenzo-Navarro,et al.  Face and Facial Feature Detection Evaluation - Performance Evaluation of Public Domain Haar Detectors for Face and Facial Feature Detection , 2008, VISAPP.

[12]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[13]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[14]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[15]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[17]  Gareth J. F. Jones,et al.  Effects of Good Speaking Techniques on Audience Engagement , 2015, ICMI.

[18]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[20]  Lei Chen,et al.  Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues , 2014, ICMI.