I-vector-based speaker identification with extremely short utterances for both training and testing

Voice applications often require the ability to make user-friendly responses by judging the user or user-type from an extremely short utterance, such as a single word. However, it is assumed that performance becomes degraded as the utterance length decreases. In this paper, we examine the performance of speaker identification for extremely short utterances of less than two seconds and then study the relationship between the accuracy and utterance length. Moreover, we show that the identification accuracy can be improved by selecting similar speakers to the target user from a large speech corpus.