Equalizing sub-band error rates in speaker recognition

Recent work on ASR by [1] [2] shows that band splitting gives recognition accuracy comparable with the conventional ful band. Sub-bands have different bandwidth spaced on a mel scale. Interestingly in the contex of speaker recognition i mproved accuracy has been reported in the case of a full-band approach using a linear scale. We demonstrate that both of these scales are likely to be suboptimum in the context of band splitting. We then describe, h ow sub-band error profiles can lead to a new scale, which is betwe en a linear and a mel spacing, giving both an equalised sub-band error profile and an improved overall recognition accuracy.

[1]  John S. D. Mason,et al.  Optimization of perceptually-based spectral transforms in speaker identification , 1991, EUROSPEECH.

[2]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.