Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition

In this paper we present a revision and evaluation of some of the main methods used in variable frame rate (VFR) analysis, applied to speech recognition systems. The work found in the literature in this area usually deals with restricted conditions and scenarios and we have revisited the main algorithmic alternatives and evaluated them under the same experimental framework, so that we have been able to establish objective considerations for each of them, selecting the most adequate strategy. We also show till what extent VFR analysis is useful in its three main application scenarios, namely “reduction of computational load”, “improve acoustic modelling” and “handling additive noise conditions in the time domain”. From our evaluation on a difficult telephone large vocabulary task, we establish that VFR analysis does not significantly improve the results obtained using the traditional fixed frame rate analysis (FFR), except when additive noise is present in the database and specially for low SNRs.

[1]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Abeer Alwan,et al.  Evaluation of noise robust features on the Aurora databases , 2002, INTERSPEECH.

[3]  P Le Cerf,et al.  A new variable frame analysis method for speech recognition , 1994 .

[4]  S. M. Peeling,et al.  The use of variable frame rate analysis in speech recognition , 1991 .