Anchor Models and WCCN Normalization For Speaker Trait Classification

This paper presents an improved version of anchor model applied to solve the two-class classification tasks of the INTERSPEECH 2012 speaker trait Challenge. To build the anchor model space of each task, we include the class models of all tasks. The introduction of within-class covariance normalization (WCCN) applied to the log-likelihood scores of the anchor space not only improves the results compared to the unnormalized version but also exceeds the performance of GMM or GMM-UBM systems. Even if Euclidean distance gives worst performances compared to cosine metric, we find that after normalization both metrics give similar results so they can be used interchangeably.

[1]  Pierre Dumouchel,et al.  Emotion recognition from children's speech using anchor models , 2012, WOCCI.

[2]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[3]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Delphine Charlet,et al.  A correlation metric for speaker tracking using anchor models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Zhaohui Wu,et al.  A Rank based Metric of Anchor Models for Speaker Verification , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Delphine Charlet,et al.  Speaker identification by location in an optimal space of anchor models , 2002, INTERSPEECH.

[10]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.