Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional Modeling

Teaching style plays an influential role in helping students to achieve academic success. In this paper, we explore a new problem of effectively understanding teachers' teaching styles. Specifically, we study 1) how to quantitatively characterize various teachers' teaching styles for various teachers and 2) how to model the subtle relationship between cross-media teaching related data (speech, facial expressions and body motions, content et al.) and teaching styles. Using the adjectives selected from more than 10,000 feedback questionnaires provided by an educational enterprise, a novel concept called Teaching Style Semantic Space (TSSS) is developed based on the pleasure-arousal dimensional theory to describe teaching styles quantitatively and comprehensively. Then a multi-task deep learning based model, Attention-based Multi-path Multi-task Deep Neural Network (AMMDNN), is proposed to accurately and robustly capture the internal correlations between cross-media features and TSSS. Based on the benchmark dataset, we further develop a comprehensive data set including 4,541 full-annotated cross-modality teaching classes. Our experimental results demonstrate that the proposed AMMDNN outperforms (+0.0842% in terms of the concordance correlation coefficient (CCC) on average) baseline methods. To further demonstrate the advantages of the proposed TSSS and our model, several interesting case studies are carried out, such as teaching styles comparison among different teachers and courses, and leveraging the proposed method for teaching quality analysis.

[1]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[3]  A. Mehrabian Basic dimensions for a general psychological theory : implications for personality, social, environmental, and developmental studies , 1980 .

[4]  Jiebo Luo,et al.  Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM , 2018, ACM Multimedia.

[5]  Qi Wang,et al.  Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach , 2018, AAAI.

[6]  Björn W. Schuller,et al.  Wearable Assistance for the Ballroom-Dance Hobbyist - Holistic Rhythm Analysis and Dance-Style Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[7]  Maosong Sun,et al.  Punctuation as Implicit Annotations for Chinese Word Segmentation , 2009, CL.

[8]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[9]  Madeleine Atkins,et al.  Effective Teaching in Higher Education , 1989 .

[10]  Alan H. Wurtzel,et al.  Evaluation of television drama: Interaction of acting styles and shot selection , 1971 .

[11]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[12]  Adam Trendowicz,et al.  Classification and Regression Trees , 2014 .

[13]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[14]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[15]  Kian-Lee Tan,et al.  A novel framework for efficient automated singer identification in large music databases , 2009, TOIS.

[16]  Alessandro Vinciarelli,et al.  Humans as feature extractors: Combining prosody and personality perception for improved speaking style recognition , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[17]  Ann R. Bradlow,et al.  Speaking and Hearing Clearly: Talker and Listener Factors in Speaking Style Changes , 2009, Lang. Linguistics Compass.

[18]  Masayuki Numao,et al.  Multimodal Fusion of EEG and Musical Features in Music-Emotion Recognition , 2017, AAAI.

[19]  Emily Mower Provost,et al.  Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Xiaolan Fu,et al.  The Reliability and Validity of the Chinese Version of Abbreviated PAD Emotion Scales , 2005, ACII.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[24]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[25]  J. Gorham The relationship between verbal teacher immediacy behaviors and student learning , 1988 .

[26]  J. Heimlich,et al.  Teaching Style: Where Are We Now? , 2002 .

[27]  Joe E. Heimlich,et al.  Developing Teaching Style in Adult Education. The Jossey-Bass Higher and Adult Education Series. , 1994 .

[28]  Shizhe Chen,et al.  Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition , 2017, AVEC@ACM Multimedia.