Data Selection for Broadcast News CSR Evaluations

Composition of the 1997 Hub-4 broadcast news test set is discussed. The composition is based on concurrent selection of a statistically-equivalent test set for a future evaluation, adjustment of the set to match the training data, and other considerations. This paper discusses both the principles involved and the specific algorithms used.

[1]  A. Winsor Sampling techniques. , 2000, Nursing times.