Segmentation and relevance measure for speaker verification

In all the efficient speaker recognition systems, the decision score is based on the average of the likelihood ratio computed on each frame of the sentence. Except for the non speech frames which are rejected, each one has the same weight in this summation. This paper deals with the study of the speaker relevance of each frame. An automatic segmentation provides quasi stationary segments of variable length; a weight is allocated to each frame in function of its segment position and a weighted mean of the likelihood ratio is then computed. Experiments are performed with NIST 2003 speaker evaluation database. They show that the frames near segment frontiers, that is to say the transient ones, are more speaker relevant than the middle frames of long segments which correspond to the steady parts of the phones.