Improved estimation of femininity using GMM supervectors and SVR for voice therapy of Gender Identity Disorder Clients

This paper proposes a new method of estimating perceptual femininity (PF) of an input utterance using Gaussian Mixture Model (GMM) supervectors and support vector regression (SVR). The method is used to develop a femininity estimation tool, which is introduced to voice therapy of Gender Identity Disorder (GID) clients, especially MtF (Male to Female) transsexuals. In our previous study [1], we developed a PF estimator, where a male GMM and a female GMM of spectral features and those of pitch features were built and their likelihood scores of an input utterance were combined by linear regression to estimate PF. In this work, inspired by recent speaker recognition models [2], we replace the four likelihood scores from the four GMMs with supervectors composed by a spectral GMM and a pitch GMM estimated from an input utterance. Further, instead of simple linear regression, we introduce SVR, which is discriminative linear regression. Experiments using an MtF speech corpus show that the proposed method improves correlation between human and machine scores of PF and also reduces squared prediction error.