论文信息 - Voice-based role separation method and device

Voice-based role separation method and device

A voice-based role separation method and device. The method comprises: extracting feature vectors from a voice signal frame by frame, so as to obtain a feature vector sequence (101); allocating role labels to the feature vectors (102); training, by employing the feature vectors having the role labels, a deep neural network (DNN) model (103); and determining, according to the DNN model and a hidden Markov model (HMM) trained by using the feature vectors, a role sequence corresponding to the feature vector sequence, and outputting a role separation result (104), wherein the DNN model is configured to output, according to an inputted feature vector, probabilities corresponding to respective roles, and the HMM is configured to describe a transition relationship between the roles. Employing the DNN model having powerful feature extraction capability to establish a model of a speaker role, the method can better describe a role compared with the conventional GMM, and can generate more detailed and accurate description of a role, thereby providing a more accurate role separation result.

李晓辉 | 李宏言

[1] Florian Metze,et al. Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training , 2013, INTERSPEECH.