Have your Cake and Eat it Too: Foreign Language Learning with a Crowdsourced Video Captioning System

Learning from captioned foreign language videos is highly effective, but the availability of such videos is limited. By using speech-to-text technology to generate partially correct transcripts as a starting point, we see an opportunity for learners to build accurate foreign language captions while learning at the same time. We present a system where learners correct captions using automatic transcription and machine-generated suggested alternative words for scaffolding. In a lab study of 49 participants, we found that compared to watching the video with accurate caption, learning and quality of experience were not significantly impaired by the secondary caption correction task using interface designs either with or without scaffolding from speech-to-text generated alternative words. Nevertheless, aggregating corrections reduced word error rate from 19% to 5.5% without scaffolding from suggested-alternatives, and 1.8% with scaffolding. Feedback from participants suggest that emphasizing the learning community contribution aspect is important for motivating learners and reducing frustration.

[1]  Steven P. Cole,et al.  A Classroom Investigation: Can Video Improve Intermediate‐Level French Language Students’ Ability to Learn about a Foreign Culture? , 2002 .

[2]  L. Vygotsky Mind in Society: The Development of Higher Psychological Processes: Harvard University Press , 1978 .

[3]  Michael Tomasello,et al.  The Effect of Video Context on Foreign Language Learning , 1992 .

[4]  Susan M. Gass,et al.  The effects of captioning videos used for foreign language listening activities , 2010 .

[5]  Wendy E. Mackay,et al.  CHI '13 Extended Abstracts on Human Factors in Computing Systems , 2013, CHI 2013.

[6]  Lekkai Ina,et al.  Incidental Foreign-Language Acquisition by Children Watching Subtitled Television Programs. , 2014 .

[7]  Melissa Troyer,et al.  Individual differences in sentence processing , 2012 .

[8]  Rod Ellis,et al.  Task-based Language Learning and Teaching , 2003 .

[9]  John Condry Enemies of exploration: Self-initiated versus other-initiated learning. , 1977 .

[10]  Hee-Jung Jung,et al.  OVERVIEW OF COMPUTER ASSISTED LANGUAGE LEARNING RESEARCH WITH SECOND LANGUAGE ACQUISITION PERSPECTIVES , 2003 .

[11]  David Thorne,et al.  The value of teletext subtitling as a medium for language learning , 2000 .

[12]  Brent N. Shiver,et al.  Evaluating Alternatives for Better Deaf Accessibility to Selected Web-Based Multimedia , 2015, ASSETS.

[13]  Otmar E. Varela,et al.  Active versus passive teaching styles: An empirical study of student learning outcomes , 2009 .

[14]  Piet Desmet,et al.  Is less more? Effectiveness and perceived usefulness of keyword and full captioned video for L2 listening comprehension* , 2013, ReCALL.

[15]  M. Csíkszentmihályi Creativity: Flow and the Psychology of Discovery and Invention , 1996 .

[16]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[17]  Jason Goulah,et al.  Village Voices, Global Visions: Digital Video as a Transformative Foreign Language Learning Tool , 2007 .

[18]  Yashesh Gaur,et al.  The effects of automatic speech recognition quality on human transcription latency , 2016, W4A.

[19]  Edward L. Deci,et al.  Curiosity and Self-Directed Learning: The Role of Motivation in Education. , 1981 .

[20]  语言学和文学 Zone of Proximal Development , 2010 .

[21]  Timothy Shea,et al.  Using Online Video to Support Student Learning and Engagement , 2011 .

[22]  Nick Chater,et al.  The Now-or-Never bottleneck: A fundamental constraint on language , 2015, Behavioral and Brain Sciences.

[23]  J. McQueen,et al.  Foreign Subtitles Help but Native-Language Subtitles Harm Foreign Speech Perception , 2009, PloS one.

[24]  Khai N. Truong,et al.  Evaluating the implicit acquisition of second language vocabulary using a live wallpaper , 2012, CHI.

[25]  Jing Zhao,et al.  MicroMandarin: mobile language learning in context , 2011, CHI.

[26]  Luis von Ahn Duolingo: learn a language for free while helping to translate the web , 2013, IUI '13.

[27]  Nick C. Ellis,et al.  Speech and language technology in education: the perspective from SLA research and practice , 2007, SLaTE.

[28]  Christopher A. Monk,et al.  Dealing with Interruptions can be Complex, but does Interruption Complexity Matter: A Mental Resources Approach to Quantifying Disruptions , 2008 .

[29]  M. Warschauer,et al.  Computers and language learning: an overview , 1998, Language Teaching.

[30]  Maxine Eskenazi,et al.  Using a Computer in Foreign Language Pronunciation Training: What Advantages? , 1999 .

[31]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[32]  M. Pennington,et al.  Comments on Martha C. Pennington and Jack C. Richards's "Pronunciation Revisited". The Authors Respond , 1987 .

[33]  Rob Miller,et al.  Smart subtitles for vocabulary learning , 2014, CHI.

[34]  Harriet D. Semke Effects of the Red Pen , 1984 .

[35]  Morten H. Christiansen,et al.  The Cambridge Handbook of Psycholinguistics: Individual Differences in Sentence Processing , 2012 .

[36]  R. K. Johnson,et al.  Vocabulary Learning Strategies and Language Learning Outcomes , 1996 .

[37]  Noa Talaván Zanón Using subtitles to enhance foreign language learning , 2006 .

[38]  Krzysztof Z. Gajos,et al.  Crowdsourcing step-by-step information extraction to enhance existing how-to videos , 2014, CHI.

[39]  James R. Glass,et al.  Wait-learning: leveraging conversational dead time for second language education , 2014, CHI Extended Abstracts.

[40]  Thomas J. Garza,et al.  Evaluating the Use of Captioned Video Materials in Advanced Foreign Language Learning , 1991 .

[41]  Kazuo Onoe,et al.  Speech recognition with a re-speak method for subtitling live broadcasts , 2002, INTERSPEECH.

[42]  Yi Xu,et al.  ToneWars: Connecting Language Learners and Native Speakers through Collaborative Mobile Games , 2014, Intelligent Tutoring Systems.