Discriminating speaking styles is an important issue in speech recognition, speaker recognition and speaker segmentation. This paper compares distance measures between Gaussian distributions for discriminating speaking styles. The Mahalanobis distance, the Bhattacharyya distance and the Kullback-Leibler divergence, which are in common use for a definition as a distance measure between Gaussian distributions, are evaluated in terms of an accuracy to discriminate speaking styles. In this paper, the accuracy is judged on a visualized map, where speaking style speech corpora are mapped onto twodimensional space by utilizing a multidimensional scaling method. It is shown that speaking style clusters appear clearly grouped on the visualized map obtained by the Bhattacharyya distance and the Kullback-Leibler divergence. In addition, the visualized map corresponds to speech recognition performance, and the Kullback-Leibler shows higher sensitivity to recognition performance.
[1]
John W. Sammon,et al.
A Nonlinear Mapping for Data Structure Analysis
,
1969,
IEEE Transactions on Computers.
[2]
Anil K. Jain,et al.
Statistical Pattern Recognition: A Review
,
2000,
IEEE Trans. Pattern Anal. Mach. Intell..
[3]
M. Shozakai,et al.
Acoustic space analysis method utilizing statistical multidimensional scaling technique
,
2005
.
[4]
Kate Hunicke-Smith,et al.
Effect of Speaking Style on LVCSR Performance
,
1996
.
[5]
Kohji Fukunaga,et al.
Introduction to Statistical Pattern Recognition-Second Edition
,
1990
.