This paper proposes a fast and robust text-independent speaker identification system for all types of radio networks. The radio-conversations contain speech from various speakers along with radio noise. A novel approach to segment the radio-conversations into speaker homogenous speech segments named as Reciever Noise Segmentation (RxNSeg) is proposed which first identifies the receiver radio-noise and then finds the boundaries for speaker homogeneous speech segments in the radio-conversation. Various techniques for clustering of speech segments to arrive at speaker homogenous clusters to train speaker models are evaluated. A novel top-down approach named as Find One Long Speech Segment (FOLSS) for finding at least one long speaker homogenous segment for each speaker present in a radio-conversation is proposed in lieu of traditional clustering techniques. Speaker modeling using Gaussian Mixture Model (GMM) and adapted-GMM are considered. The two speaker modeling methods with proposed RxNSeg and FOLSS show an average 86:32% reduction in testing time without significant loss of speaker identification accuracy as com-pared to traditional segmentation and clustering techniques.
[1]
Douglas A. Reynolds,et al.
Blind clustering of speech utterances based on speaker and language characteristics
,
1998,
ICSLP.
[2]
Douglas A. Reynolds,et al.
Speaker Verification Using Adapted Gaussian Mixture Models
,
2000,
Digit. Signal Process..
[3]
John H. L. Hansen,et al.
Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion
,
2005,
IEEE Transactions on Speech and Audio Processing.
[4]
Dominique Fohr,et al.
Speaker diarization using normalized cross likelihood ratio
,
2007,
INTERSPEECH.
[5]
Douglas A. Reynolds,et al.
An overview of automatic speaker diarization systems
,
2006,
IEEE Transactions on Audio, Speech, and Language Processing.