Evaluating acoustic distance measures for template based recognition

In this paper we investigate the behaviour of different acoustic distance measures for template based speech recognition in light of the combination of acoustic distances, linguistic knowledge and template concatenation fluency costs. To that end, different acoustic distance measures are compared on tasks with varying levels of fluency/linguistic constraints. We show that the adoption of those constraints invariably results in an acoustically clearly suboptimal template sequence being chosen as the winning hypothesis. There are strong implications for the design of acoustic distance measures: distance measures that are optimal for frame based classification may prove to be suboptimal for full sentence recognition. In particular, we show this is the case when comparing the Euclidean and the recently introduced adaptive kernel local Mahalanobis distance measures.

[1]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[2]  Dirk Van Compernolle,et al.  Outlier Correction for Local Distance Measures in Example Based Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Dirk Van Compernolle,et al.  Optimal feature sub-space selection based on discriminant analysis , 1999, EUROSPEECH.

[4]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Patrick Wambacq,et al.  A locally weighted distance measure for example based speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hermann Ney,et al.  Modeling and search in continuous speech recognition , 1993, EUROSPEECH.