Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits

Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.

[1]  Guillermo Sapiro,et al.  Intel realsense = Real low cost gaze , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  Haydn Jones,et al.  All together now , 1995 .

[3]  Charles E. Hughes,et al.  A case study to track teacher gestures and performance in a virtual learning environment , 2015, LAK.

[4]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Patrick C. Kyllonen,et al.  Measurement of 21st Century Skills Within the Common Core State Standards , 2012 .

[6]  Lei Chen,et al.  Utilizing multimodal cues to automatically evaluate public speaking performance , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[7]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[8]  Lei Chen,et al.  Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality , 2014, MLA@ICMI.

[9]  Albert A. Rizzo,et al.  FAAST: The Flexible Action and Articulated Skeleton Toolkit , 2011, 2011 IEEE Virtual Reality Conference.

[10]  Mitchell J. Nathan,et al.  Embodiment in Mathematics Teaching and Learning: Evidence From Learners' and Teachers' Gestures , 2012 .

[11]  Andrew C. Porter,et al.  Common Core Standards , 2011 .

[12]  Hugo Jair Escalante,et al.  Simultaneous Segmentation and Recognition of Gestures for Human-Machine Interaction , 2013, UDM@IJCAI.

[13]  Sherwyn P. Morreale The Competent Speaker Speech Evaluation Form , 1993 .

[14]  Kristy Elizabeth Boyer,et al.  Automatically Recognizing Facial Expression: Predicting Engagement and Frustration , 2013, EDM.

[15]  O. Celik,et al.  Systematic review of Kinect applications in elderly care and stroke rehabilitation , 2014, Journal of NeuroEngineering and Rehabilitation.

[16]  Lei Chen,et al.  A Prototype Public Speaking Skills Assessment: An Evaluation of Human‐Scoring Quality , 2015 .

[17]  F. G. Evans,et al.  Anatomical Data for Analyzing Human Motion , 1983 .

[18]  David Ott,et al.  Kinect analysis: a system for recording, analysing and sharing multimodal interaction elicitation studies , 2015, EICS.

[19]  Sean Neill,et al.  Body Language for Competent Teachers , 1993 .

[20]  Lisa M. Schreiber,et al.  The Development and Test of the Public Speaking Competence Rubric , 2012 .

[21]  Marcelo Worsley,et al.  Towards the development of multimodal action based assessment , 2013, LAK '13.

[22]  Joze Guna,et al.  An Analysis of the Precision and Reliability of the Leap Motion Sensor and Its Suitability for Static and Dynamic Tracking , 2014, Sensors.

[23]  Michael Kipp Annotation Facilities for the Reliable Analysis of Human Motion , 2012, LREC.

[24]  P. W. Miller Body Language in the Classroom. , 2005 .

[25]  Antonio Camurri,et al.  Toward a Minimal Representation of Affective Gestures , 2011, IEEE Transactions on Affective Computing.

[26]  R. Birdwhistell Kinesics and Context: Essays on Body Motion Communication , 1971 .

[27]  Radoslaw Niewiadomski,et al.  Haman and Virtual Agent Expressive Gesture Quality Analysis and Synthesis , 2013 .

[28]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[29]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[31]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[32]  Yao-Jen Chang,et al.  A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. , 2011, Research in developmental disabilities.

[33]  Lei Chen,et al.  Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues , 2014, ICMI.

[34]  Michael Arens,et al.  Low-cost commodity depth sensor comparison and accuracy analysis , 2014, Security and Defence.

[35]  Tobias Baur,et al.  The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time , 2013, ACM Multimedia.

[36]  Pierre Dillenbourg,et al.  Holistic Analysis of the Classroom , 2014, MLA@ICMI.

[37]  Xavier Ochoa,et al.  Presentation Skills Estimation Based on Video and Kinect Data Analysis , 2014, MLA@ICMI.