论文信息 - LMELECTURES: A Multimedia Corpus of Academic Spoken English

LMELECTURES: A Multimedia Corpus of Academic Spoken English

This paper describes the acquisition, transcription and annotation of a multi-media corpus of academic spoken English, the LMELectures. It consists of two lecture series that were read in the summer term 2009 at the computer science department of the University of ErlangenNuremberg, covering topics in pattern analysis, machine learning and interventional medical image processing. In total, about 40 hours of high-definition audio and video of a single speaker was acquired in a constant recording environment. In addition to the recordings, the presentation slides are available in machine readable (PDF) format. The manual annotations include a suggested segmentation into speech turns and a complete manual transcription that was done using BLITZSCRIBE2, a new tool for the rapid transcription. For one lecture series, the lecturer assigned key words to each recordings; one recording of that series was further annotated with a list of ranked key phrases by five human annotators each. The corpus is available for non-commercial purpose upon request. Index Terms: corpus description, academic spoken English, e-learning

[1] Elmar Nöth,et al. Java Visual Speech Components for Rapid Application Development of GUI Based Speech Processing Applications , 2011, INTERSPEECH.

[2] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3] Jaana Kekäläinen,et al. IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[4] Rita C Simpson-Vlach,et al. The MICASE Handbook: A Resource for Users of the Michigan Corpus of Academic Spoken English , 2006 .

[5] Deb Roy,et al. Fast transcription of unstructured audio recordings , 2009, INTERSPEECH.

[6] Korbinian Riedhammer,et al. Interactive approaches to video lecture assessment , 2012 .

[7] Pavel Matejka,et al. Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[8] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[9] Elmar Nöth,et al. The FAU Video Lecture Browser system , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).