Multimodal Capture of Teacher-Student Interactions for Automated Dialogic Analysis in Live Classrooms

We focus on data collection designs for the automated analysis of teacher-student interactions in live classrooms with the goal of identifying instructional activities (e.g., lecturing, discussion) and assessing the quality of dialogic instruction (e.g., analysis of questions). Our designs were motivated by multiple technical requirements and constraints. Most importantly, teachers could be individually micfied but their audio needed to be of excellent quality for automatic speech recognition (ASR) and spoken utterance segmentation. Individual students could not be micfied but classroom audio quality only needed to be sufficient to detect student spoken utterances. Visual information could only be recorded if students could not be identified. Design 1 used an omnidirectional laptop microphone to record both teacher and classroom audio and was quickly deemed unsuitable. In Designs 2 and 3, teachers wore a wireless Samson AirLine 77 vocal headset system, which is a unidirectional microphone with a cardioid pickup pattern. In Design 2, classroom audio was recorded with dual first- generation Microsoft Kinects placed at the front corners of the class. Design 3 used a Crown PZM-30D pressure zone microphone mounted on the blackboard to record classroom audio. Designs 2 and 3 were tested by recording audio in 38 live middle school classrooms from six U.S. schools while trained human coders simultaneously performed live coding of classroom discourse. Qualitative and quantitative analyses revealed that Design 3 was suitable for three of our core tasks: (1) ASR on teacher speech (word recognition rate of 66% and word overlap rate of 69% using Google Speech ASR engine); (2) teacher utterance segmentation (F-measure of 97%); and (3) student utterance segmentation (F-measure of 66%). Ideas to incorporate video and skeletal tracking with dual second-generation Kinects to produce Design 4 are discussed.

[1]  M. Nystrand,et al.  Instructional Discourse, Student Engagement, and Literature Achievement , 1991, Research in the Teaching of English.

[2]  A. Graesser,et al.  Question Asking During Tutoring , 1994 .

[3]  Thomas D. Snyder,et al.  Digest of Education Statistics , 1994 .

[4]  Martin Nystrand,et al.  Opening Dialogue: Understanding the Dynamics of Language and Learning in the English Classroom (Language and Literacy Series) , 1996 .

[5]  Fred M. Newmann,et al.  Continuing the bifurcation of affect and cognition , 1996 .

[6]  T. Hartig,et al.  Effects of Classroom Seating Arrangements on Children's question-asking , 1999 .

[7]  Martin Nystrand,et al.  Questions in Time: Investigating the Structure and Dynamics of Unfolding Classroom Discourse , 2003 .

[8]  Martin Nystrand,et al.  Discussion-Based Approaches to Developing Understanding: Classroom Instruction and Student Performance in Middle and High School English , 2003 .

[9]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[10]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Madhumita Bhattacharya,et al.  Video in Research in the Learning Sciences , 2008 .

[12]  Seán Kelly,et al.  Race, social class, and student engagement in middle school English classrooms. , 2008, Social science research.

[13]  Umit Yapanel,et al.  The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File , 2009 .

[14]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[15]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[16]  Mickael Rouvier,et al.  An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.

[17]  Kevin F. Miller,et al.  Using the LENA in teacher training: Promoting student involvement through automated feedback , 2013 .

[18]  Thomas J. Kane,et al.  Ensuring Fair and Reliable Measures of Effective Teaching Culminating Findings from the MET Project ’ s Three-Year Study , 2013 .

[19]  Hermann Ney,et al.  RASR/NN: The RWTH neural network toolkit for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Xingyu Pan,et al.  Automatic classification of activities in classroom discourse , 2014, Comput. Educ..

[21]  S. D’Mello,et al.  Evaluating microphones and microphone placement for signal processing and automatic speech recognition of teacher-student dialog , 2014 .

[22]  Arthur C. Graesser,et al.  Domain Independent Assessment of Dialogic Properties of Classroom Discourse , 2014, EDM.

[23]  Martha W. Alibali,et al.  How Teachers Link Ideas in Mathematics Instruction Using Speech and Gesture: A Corpus Analysis , 2014 .

[24]  Andrew Olney,et al.  Automatic Classification of Question & Answer Discourse Segments from Teacher's Speech in Classrooms. , 2015, EDM 2015.

[25]  Andrew Olney,et al.  A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis , 2015, AIED.