Exploring nuisance attribute projection and score normalization for GLDS-SVM based automatic mispronunciation detection method

In the task of mispronunciation detection, the cross-speaker degradation and some other confusing nuisances are the challenging problems demanding prompt solution. In this paper, we will attempt to remove the non-pronunciation variations in the GLDS-SVM expansion space by using nuisance attribute projection strategy, in order to increase the separating capacity between different phoneme instances. Moreover, different kinds of score normalization methods with softmax, posterior probability vector (PPV), Z-norm and T-norm are comparatively discussed. The experiments on three kinds of speech corpora demonstrate the effectiveness of the above methods, and the performance improvement is not very significant, but sustainable.

[1]  Driss Matrouf,et al.  State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bo Xu,et al.  High performance automatic mispronunciation detection method based on neural network and TRAP features , 2009, INTERSPEECH.

[3]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[4]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Bo Xu,et al.  An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Bo Xu,et al.  Exploring the automatic mispronunciation detection of confusable phones for mandarin , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.