Use of broadcast news materials for speech recognition benchmark tests

This paper reports on the use of materials derived from radio For the 1996 tests, a detailed annotation convention was and television news broadcasts for research and testing implemented by the LDC to capture these effects. This purposes for large vocabulary Continuous Speech Recogniconvention permitted the community to implement a tion (CSR) technology. Tests using these materials have been particular form of partitioned evaluation, and for which implemented by NIST on behalf of the DARPA-funded NIST’s analyses of results could be partitioned into speech recognition research community in 1995 and 1996, corresponding categories. and are expected to continue for the next several years. Four research groups participated in the 1995 tests, and nine For all systems, perhaps most striking feature of these tests groups (at eight sites) participated in the 1996 tests. This is that from segment to segment the word error rates paper documents properties of the training and test materials, sometimes vary over a wide range. The challenge that is describes a detailed annotation and transcription protocol that presented by the broadcast materials is significant. In the has been used for more than 100 hours of recorded data that 1996 tests, involving both radio and TV broadcast news has been made available through the Linguistic Data Consormaterials, the system with the lowest measured error rate had tium (LDC), and discusses test protocols and results of both an overall word error rate of 27.1%. the 1995 and 1996 Benchmark Tests.