MATBN: A Mandarin Chinese Broadcast News Corpus

The MATBN Mandarin Chinese broadcast news corpus contains a total of 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. In this paper, we briefly introduce. the speech corpus and report on some preliminary statistical analysis and speech recognition evaluation results.