Personalized Multimodal Feedback Generation in Education

The automatic evaluation for school assignments is an important application of AI in the education field. In this work, we focus on the task of personalized multimodal feedback generation, which aims to generate personalized feedback for various teachers to evaluate students' assignments involving multimodal inputs such as images, audios, and texts. This task involves the representation and fusion of multimodal information and natural language generation, which presents the challenges from three aspects: 1) how to encode and integrate multimodal inputs; 2) how to generate feedback specific to each modality; and 3) how to realize personalized feedback generation. In this paper, we propose a novel Personalized Multimodal Feedback Generation Network (PMFGN) armed with a modality gate mechanism and a personalized bias mechanism to address these challenges. The extensive experiments on real-world K-12 education data show that our model significantly outperforms several baselines by generating more accurate and diverse feedback. In addition, detailed ablation experiments are conducted to deepen our understanding of the proposed framework.

[1]  Songfan Yang,et al.  Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education , 2020, WWW.

[2]  Yangyong Zhu,et al.  Net2Text: An Edge Labelling Language Model for Personalized Review Generation , 2019, DASFAA.

[3]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Songfan Yang,et al.  Recent Advances in Multimodal Educational Data Mining in K-12 Education , 2020, KDD.

[6]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[7]  Zhiwei Wang,et al.  Automatic Short Answer Grading via Multiway Attention Networks , 2019, AIED.

[8]  Quoc-Tuan Truong,et al.  Multimodal Review Generation for Recommender Systems , 2019, WWW.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[11]  Ting Liu,et al.  Neural personalized response generation as domain adaptation , 2017, World Wide Web.

[12]  Mirella Lapata,et al.  Learning to Generate Product Reviews from Attributes , 2017, EACL.

[13]  Zihan Zhou,et al.  Image Based Review Text Generation with Emotional Guidance , 2019, ArXiv.

[14]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[15]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[16]  John R. Hershey,et al.  Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[19]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[20]  Yu Zhang,et al.  Personalizing a Dialogue System With Transfer Reinforcement Learning , 2016, AAAI.

[21]  Julian J. McAuley,et al.  Personalized Review Generation By Expanding Phrases and Attending on Aspect-Aware Representations , 2018, ACL.

[22]  Subhashini Venugopalan,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[23]  Bernt Schiele,et al.  The Long-Short Story of Movie Description , 2015, GCPR.

[24]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[25]  Min Yang,et al.  Personalized response generation by Dual-learning based domain adaptation , 2018, Neural Networks.

[26]  Jianfeng Gao,et al.  Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models , 2017, IJCNLP.

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.