Factor analysis subspace estimation for speaker verification with short utterances

Training the speaker and session subspaces is an integral problem in developing a joint factor analysis GMM speaker verification system. This work investigates and compares several alternative procedures for this task with a particular focus on training and testing with short utterances. Experiments show that better performance can be obtained when an independent rather than simultaneous optimisation of the two core variability subspaces is used. It is additionally shown that for verification trials on short utterances it is important for the session subspace to be trained with matched length utterances. Conversely, the speaker transform should always be trained with as much data as possible.