The statistical structures of male and female speech signals

The goal of this paper is to learn or adapt statistical features of gender specific speech signals. The adaptation is performed by finding basis functions that encode the speech signal such that the resulting coefficients are statistically independent and the information redundancy is minimized. We use a flexible independent component analysis (ICA) algorithm to adapt the basis functions as well as the source coefficients for male and female speakers respectively. The learned features show significant differences in frequency and time span. Our results suggest that the male speech features can be described by Gabor-like wavelet filters whereas the female speech signal has a much longer time span. We present a detailed time-frequency analysis strongly suggesting that those features can be used to qualify and quantify gender-specific speech signal differences.

[1]  Terrence J. Sejnowski,et al.  Blind separation and blind deconvolution: an information-theoretic approach , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Ho-Young Jung,et al.  Speech feature extraction using independent component analysis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Jean-François Cardoso,et al.  Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[4]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[5]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..