论文信息 - Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News

Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News

This paper presents the audio corpus developed in the framework of the ESTER evaluation campaign of French broadcast news transcription systems. This corpus includes 100 hours of manually annotated recordings and 1,677 hours of non transcribed data. The manual annotations include the detailed verbatim orthographic transcription, the speaker turns and identities, information about acoustic conditions, and name entities. Additional resources generated by automatic speech processing systems, such as phonetic alignments and word graphs, are also described.

[1] Charles L. Wayne. Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[2] Mark Liberman,et al. Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[3] Alvin F. Martin,et al. The NIST Speaker Recognition Evaluations: 1996-2001 , 1998, Odyssey.