Speaker recognition based on idiolectal differences between speakers

“Familiar” speaker information is explored using non-acoustic features in NIST’s new “extended data” speaker detection task.[1] Word unigrams and bigrams, used in a traditional target/background likelihood ratio framework, are shown to give surprisingly good performance. Performance continues to improve with additional training and/or test data. Bigram performance is also found to be a function of target/model sex and age difference. These initial experiments strongly suggest that further exploration of “familiar” speaker characteristics will likely be an extremely interesting and valuable research direction for recognition of speakers in conversational speech.