Feature transformation for speaker verification under speaking rate mismatch condition

Speaker verification suffers from serious performance degradation under speaking rate mismatch condition. This degradation can be largely attributed to the spectrum distortion caused by different speaking rates. This paper proposes a feature transform approach which projects speech features in slow speaking rates to features in normal speaking rates. The feature space maximum likelihood linear regression (fMLLR) is adopted to conduct the transform, under the well-known GMM-UBM framework. The proposed approach has been evaluated on the CSLT-SPRateDGT2016 corpus which consists of normal and slow speech. The experiments show that with the transform, the equal error rate (EER) of the GMM-UBM system was reduced by 19.04% relatively. More interestingly, the transform learned based on the GMM-UBM system can improve i-vector systems as well. Our experiments show a 10.16% relative EER reduction after the transform was applied to the i-vector/PLDA system.